다중모드 대형 언어 모델에서의 모달리티 선호도 평가 및 조정

초록

멀티모달 대형 언어 모델(MLLMs)은 복잡한 멀티모달 컨텍스트 작업에서 뛰어난 성능을 달성해 왔다. 그러나 이러한 모델들이 멀티모달 컨텍스트를 처리할 때 모달리티 선호 현상을 보이는지에 대한 연구는 아직 미흡한 상태이다. 이 문제를 연구하기 위해, 우리는 먼저 통제된 증거 충돌 시나리오 하에서 MC\textsuperscript{2} 벤치마크를 구축하여 모달리티 선호, 즉 멀티모달 충돌 증거를 기반으로 결정을 내릴 때 한 모달리티를 다른 모달리티보다 선호하는 경향을 체계적으로 평가하였다. 우리의 광범위한 평가 결과, 테스트된 18개의 MLLM 모두 일반적으로 명확한 모달리티 편향을 보이며, 모달리티 선호는 외부 개입에 의해 영향을 받을 수 있음이 밝혀졌다. 심층 분석 결과, 이러한 선호 방향은 MLLM의 잠재 표현 내에서 포착될 수 있음이 확인되었다. 이를 바탕으로, 우리는 추가적인 미세 조정이나 신중하게 설계된 프롬프트 없이도 모달리티 선호를 명시적으로 제어하기 위해 표현 공학에 기반한 프로빙 및 스티어링 방법을 제안한다. 우리의 방법은 원하는 방향으로 모달리티 선호를 효과적으로 증폭시키며, 환각 완화 및 멀티모달 기계 번역과 같은 다운스트림 작업에 적용하여 유망한 개선을 이끌어낸다.

English

Multimodal large language models (MLLMs) have achieved remarkable performance on complex tasks with multimodal context. However, it is still understudied whether they exhibit modality preference when processing multimodal contexts. To study this question, we first build a MC\textsuperscript{2} benchmark under controlled evidence conflict scenarios to systematically evaluate modality preference, which is the tendency to favor one modality over another when making decisions based on multimodal conflicting evidence. Our extensive evaluation reveals that all 18 tested MLLMs generally demonstrate clear modality bias, and modality preference can be influenced by external interventions. An in-depth analysis reveals that the preference direction can be captured within the latent representations of MLLMs. Built on this, we propose a probing and steering method based on representation engineering to explicitly control modality preference without additional fine-tuning or carefully crafted prompts. Our method effectively amplifies modality preference toward a desired direction and applies to downstream tasks such as hallucination mitigation and multimodal machine translation, yielding promising improvements.

다중모드 대형 언어 모델에서의 모달리티 선호도 평가 및 조정

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

초록

Support