評估與引導多模態大語言模型中的模態偏好

摘要

多模態大型語言模型（MLLMs）在處理多模態語境的複雜任務中展現了卓越的性能。然而，這些模型在處理多模態語境時是否表現出模態偏好，仍是一個尚未充分研究的問題。為探討此問題，我們首先在受控的證據衝突情境下構建了一個MC\textsuperscript{2}基準，以系統評估模態偏好，即在基於多模態衝突證據進行決策時，傾向於偏好某一模態的趨勢。我們廣泛的評估顯示，所有18個測試的MLLMs普遍表現出明顯的模態偏見，且模態偏好可受外部干預影響。深入分析揭示，偏好方向可在MLLMs的潛在表徵中被捕捉。基於此，我們提出了一種基於表徵工程的探測與引導方法，無需額外微調或精心設計提示，即可明確控制模態偏好。該方法有效增強了朝向期望方向的模態偏好，並應用於如幻覺緩解及多模態機器翻譯等下遊任務，取得了顯著的改進。

English

Multimodal large language models (MLLMs) have achieved remarkable performance on complex tasks with multimodal context. However, it is still understudied whether they exhibit modality preference when processing multimodal contexts. To study this question, we first build a MC\textsuperscript{2} benchmark under controlled evidence conflict scenarios to systematically evaluate modality preference, which is the tendency to favor one modality over another when making decisions based on multimodal conflicting evidence. Our extensive evaluation reveals that all 18 tested MLLMs generally demonstrate clear modality bias, and modality preference can be influenced by external interventions. An in-depth analysis reveals that the preference direction can be captured within the latent representations of MLLMs. Built on this, we propose a probing and steering method based on representation engineering to explicitly control modality preference without additional fine-tuning or carefully crafted prompts. Our method effectively amplifies modality preference toward a desired direction and applies to downstream tasks such as hallucination mitigation and multimodal machine translation, yielding promising improvements.

評估與引導多模態大語言模型中的模態偏好

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

摘要

Support