思辨于不确定:利用隐式熵感知解码缓解多语言检索模型的幻觉问题
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
March 9, 2026
作者: Zhongxing Xu, Zhonghua Wang, Zhe Qian, Dachuan Shi, Feilong Tang, Ming Hu, Shiyan Su, Xiaocheng Zou, Wei Feng, Dwarikanath Mahapatra, Yifan Peng, Mingquan Lin, Zongyuan Ge
cs.AI
摘要
近期,多模态大推理模型(MLRMs)的显著进展极大提升了视觉问答任务的性能。然而我们观察到,转折词(如"因为""但是""且慢")与幻觉现象密切关联,且易呈现高熵状态。我们认为,充足的上下文推理信息可直接从词元概率分布中提取。受叠加表征理论启发,我们提出利用潜在叠加推理技术来融合多候选语义并维持潜在推理轨迹。研究假设认为,对离散文本输入的依赖可能使模型倾向于序列化显式推理,在高熵推理阶段未能充分利用密集的上下文线索。因此,我们提出基于词元概率分布构建丰富语义表征以增强上下文推理能力。基于此目标,我们提出潜在熵感知解码(LEAD)——一种高效的即插即用解码策略,通过语义上下文实现可靠推理。该方法的核心在于熵感知推理模式切换:模型在高熵状态下采用概率加权的连续嵌入,随熵值降低切换回离散词元嵌入。此外,我们提出先验引导的视觉锚点注入策略,促使模型聚焦视觉信息。大量实验表明,LEAD能在多个基准测试中有效缓解各类MLRMs的幻觉现象。
English
Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with hallucinations and tend to exhibit high-entropy states. We argue that adequate contextual reasoning information can be directly extracted from the token probability distribution. Inspired by superposed representation theory, we propose leveraging latent superposed reasoning to integrate multiple candidate semantics and maintain latent reasoning trajectories. The hypothesis is that reliance on discrete textual inputs may drive the model toward sequential explicit reasoning, underutilizing dense contextual cues during high-entropy reasoning stages. Therefore, we propose constructing rich semantic representations from the token probability distributions to enhance in-context reasoning. With this goal, we present Latent Entropy-Aware Decoding (LEAD), an efficient plug-and-play decoding strategy that leverages semantic context to achieve reliable reasoning. The heart of our method lies in entropy-aware reasoning mode switching. The model employs probability-weighted continuous embeddings under high-entropy states and transitions back to discrete token embeddings as entropy decreases. Moreover, we propose a prior-guided visual anchor injection strategy that encourages the model to focus on visual information. Extensive experiments show that LEAD effectively mitigates hallucinations across various MLRMs on multiple benchmarks.