AVERE:基于偏好优化的视听情感推理能力提升
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
February 4, 2026
作者: Ashutosh Chaubey, Jiacheng Pang, Maksim Siniukov, Mohammad Soleymani
cs.AI
摘要
情感理解是构建社会智能体的关键要素。尽管当前多模态大语言模型在此任务中展现出强大性能,但仍存在两大挑战:情绪与无关视听线索之间的伪关联,以及语言模型主干中文本先验驱动的视听线索幻觉。为量化并解析这些问题,我们提出EmoReAlM基准测试,专门评估多模态大语言模型在线索-情绪关联性、幻觉现象及模态一致性方面的表现。进而我们提出AVEm-DPO偏好优化技术,使模型响应与视听输入及情感中心化查询实现对齐。该方法通过构建基于文本提示引导的响应偏好体系,针对存在伪关联或幻觉的响应与视听输入对进行优化,同时引入正则化项以惩罚对文本先验的依赖,从而有效抑制特定模态线索的幻觉。在DFEW、RAVDESS和EMER数据集上的实验表明,我们的方法使基线模型在零样本场景下获得6-19%的相对性能提升。通过提供严谨的基准测试与鲁棒的优化框架,本研究为多模态大语言模型的情感理解能力评估与社会人工智能发展奠定了理论基础。代码、模型及基准测试数据将于https://avere-iclr.github.io发布。
English
Emotion understanding is essential for building socially intelligent agents. Although recent multimodal large language models have shown strong performance on this task, two key challenges remain - spurious associations between emotions and irrelevant audiovisual cues, and hallucinations of audiovisual cues driven by text priors in the language model backbone. To quantify and understand these issues, we introduce EmoReAlM, a benchmark designed to evaluate MLLMs for cue-emotion associations, hallucinations and modality agreement. We then propose AVEm-DPO, a preference optimization technique that aligns model responses with both audiovisual inputs and emotion-centric queries. Specifically, we construct preferences over responses exhibiting spurious associations or hallucinations, and audiovisual input pairs guided by textual prompts. We also include a regularization term that penalizes reliance on text priors, thereby mitigating modality-specific cue hallucinations. Experimental results on DFEW, RAVDESS and EMER demonstrate that our method significantly improves the performance of the reference baseline models with 6-19% of relative performance gains in zero-shot settings. By providing both a rigorous benchmark and a robust optimization framework, this work enables principled evaluation and improvement of MLLMs for emotion understanding and social AI. Code, models and benchmark will be released at https://avere-iclr.github.io.