When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models

Nov 1, 2025ยท
Cheng Wang
Gelei Deng
Gelei Deng
,
Xianglin Yang
,
Han Qiu
,
Tianwei Zhang
ยท 1 min read
Abstract
Large Audio-Language Models have shown impressive capabilities in understanding and processing multimodal inputs. However, this work reveals a critical text bias in these models, where textual information can override or distort the understanding of audio content when the two modalities disagree. We systematically analyze this phenomenon and its implications for model reliability.
Type
Publication
Conference on Empirical Methods in Natural Language Processing (EMNLP)

This work investigates the behavior of Large Audio-Language Models when audio and text inputs provide conflicting information, revealing a systematic text bias that has important implications for model reliability and safety.