评估与引导多模态大语言模型中的模态偏好

摘要

多模态大语言模型（MLLMs）在处理复杂多模态上下文任务时展现了卓越性能。然而，这些模型在处理多模态上下文时是否表现出模态偏好仍鲜有研究。为探讨这一问题，我们首先构建了一个MC\textsuperscript{2}基准测试，在受控的证据冲突场景下系统评估模态偏好，即模型在基于多模态冲突证据做出决策时倾向于优先考虑某一模态的倾向。广泛的评估结果显示，所有测试的18个MLLMs普遍存在明显的模态偏差，且模态偏好可受外部干预影响。深入分析表明，偏好方向可在MLLMs的潜在表征中捕捉到。基于此，我们提出了一种基于表征工程的探测与引导方法，无需额外微调或精心设计的提示，即可明确控制模态偏好。该方法有效增强了朝向期望方向的模态偏好，并应用于下游任务如幻觉缓解和多模态机器翻译，取得了显著的改进效果。

English

Multimodal large language models (MLLMs) have achieved remarkable performance on complex tasks with multimodal context. However, it is still understudied whether they exhibit modality preference when processing multimodal contexts. To study this question, we first build a MC\textsuperscript{2} benchmark under controlled evidence conflict scenarios to systematically evaluate modality preference, which is the tendency to favor one modality over another when making decisions based on multimodal conflicting evidence. Our extensive evaluation reveals that all 18 tested MLLMs generally demonstrate clear modality bias, and modality preference can be influenced by external interventions. An in-depth analysis reveals that the preference direction can be captured within the latent representations of MLLMs. Built on this, we propose a probing and steering method based on representation engineering to explicitly control modality preference without additional fine-tuning or carefully crafted prompts. Our method effectively amplifies modality preference toward a desired direction and applies to downstream tasks such as hallucination mitigation and multimodal machine translation, yielding promising improvements.

评估与引导多模态大语言模型中的模态偏好

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

摘要

Support