指令锚点:解构模态仲裁的因果动态机制
Instruction Anchors: Dissecting the Causal Dynamics of Modality Arbitration
February 3, 2026
作者: Yu Zhang, Mufan Xu, Xuefeng Bai, Kehai chen, Pengfei Zhang, Yang Xiang, Min Zhang
cs.AI
摘要
模态遵循能力是多模态大语言模型(MLLMs)根据用户指令选择性利用多模态语境的核心机制,对于确保实际应用中的安全性与可靠性至关重要。然而,这一决策过程的内部运作机制尚不明确。本文通过信息流视角探究其工作原理,发现指令标记在模态仲裁中发挥结构性锚点作用:浅层注意力层进行非选择性信息传递,将多模态线索路由至这些锚点形成潜在缓冲;深层注意力层在指令意图引导下解决模态竞争,而MLP层则表现出语义惯性,形成对抗性调节力。此外,我们识别出驱动该仲裁过程的稀疏化专用注意力头群。因果干预实验表明,仅需操控5%的关键注意力头即可通过阻塞使模态遵循率降低60%,或通过针对性增强失败样本使其提升60%。本研究为增强模型可解释性提供了重要突破,并为多模态信息的协调管理建立了理论框架。
English
Modality following serves as the capacity of multimodal large language models (MLLMs) to selectively utilize multimodal contexts based on user instructions. It is fundamental to ensuring safety and reliability in real-world deployments. However, the underlying mechanisms governing this decision-making process remain poorly understood. In this paper, we investigate its working mechanism through an information flow lens. Our findings reveal that instruction tokens function as structural anchors for modality arbitration: Shallow attention layers perform non-selective information transfer, routing multimodal cues to these anchors as a latent buffer; Modality competition is resolved within deep attention layers guided by the instruction intent, while MLP layers exhibit semantic inertia, acting as an adversarial force. Furthermore, we identify a sparse set of specialized attention heads that drive this arbitration. Causal interventions demonstrate that manipulating a mere 5% of these critical heads can decrease the modality-following ratio by 60% through blocking, or increase it by 60% through targeted amplification of failed samples. Our work provides a substantial step toward model transparency and offers a principled framework for the orchestration of multimodal information in MLLMs.