通过分阶段自奖励机制缓解多模态幻觉

摘要

大型视觉语言模型（LVLM）仍面临视觉幻觉问题，即生成内容与视觉输入不一致。现有方法要么依赖大规模标注数据进行微调（导致巨大计算开销），要么采用静态后处理策略（忽略了幻觉产生的动态特性）。为此，我们提出了一种新型自奖励框架，可在无需外部监督的情况下实现推理阶段的动态幻觉抑制。在实证研究中，我们发现视觉幻觉呈现阶段性动态模式，并在各语义阶段起始时达到峰值。基于此发现，我们提出了PSRD（阶段性自奖励解码）方法，通过阶段性自奖励信号实现在线幻觉校正。为降低解码过程中重复自评估的成本，我们将LVLM的幻觉引导信号蒸馏至轻量级奖励模型中。该奖励模型随后在解码过程中提供实时引导，实现精准的幻觉抑制。实验表明，PSRD将LLaVA-1.5-7B的幻觉率显著降低50.0%，并在针对四种LVLM的五项幻觉评估基准中持续优于现有后处理方法。进一步分析证实，PSRD能有效抑制幻觉传播，并在强性能与推理效率之间实现高度可控的平衡。

English

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a new self-rewarding framework, enabling dynamic hallucination mitigation at inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose PSRD (Phase-wise \textbf{Self-Reward Decoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated self-evaluation during decoding, we distill the hallucination guidance signal from LVLMs into a lightweight reward model. The reward model subsequently provides on-the-fly guidance for targeted intervention during the decoding process, enabling precise hallucination suppression. The proposed PSRD significantly reduces the hallucination rate of LLaVA-1.5-7B by 50.0% and consistently outperforms existing post-hoc methods across five hallucination evaluation benchmarks for four LVLMs. Further analysis confirms that PSRD effectively mitigates hallucination propagation and achieves a highly controllable trade-off between strong performance and inference efficiency.

通过分阶段自奖励机制缓解多模态幻觉

Mitigating Multimodal Hallucination via Phase-wise Self-reward

摘要

Support