When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
Nov 1, 2025ยท
,,,ยท
1 min read
Cheng Wang
Gelei Deng
Xianglin Yang
Han Qiu
Tianwei Zhang
Abstract
Large Audio-Language Models have shown impressive capabilities in understanding and processing multimodal inputs. However, this work reveals a critical text bias in these models, where textual information can override or distort the understanding of audio content when the two modalities disagree. We systematically analyze this phenomenon and its implications for model reliability.
Type
Publication
Conference on Empirical Methods in Natural Language Processing (EMNLP)
This work investigates the behavior of Large Audio-Language Models when audio and text inputs provide conflicting information, revealing a systematic text bias that has important implications for model reliability and safety.