通过句子级早期干预缓解目标幻觉
Mitigating Object Hallucinations via Sentence-Level Early Intervention
July 16, 2025
作者: Shangpin Peng, Senqiao Yang, Li Jiang, Zhuotao Tian
cs.AI
摘要
多模态大语言模型(MLLMs)在跨模态理解方面取得了革命性进展,但仍面临幻觉问题——即生成与视觉输入相矛盾的虚构内容。现有的幻觉缓解方法要么计算成本过高,要么导致训练数据与模型输出之间的分布不匹配。我们揭示了一个关键发现:幻觉主要出现在文本生成的早期阶段,并随后续输出传播。为此,我们提出了**SENTINEL**(**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning)框架,该框架无需依赖人工标注。具体而言,我们首先通过迭代采样模型输出、利用两个开放词汇检测器交叉验证对象存在性,并将句子分类为幻觉/非幻觉类别,从而自举高质量域内偏好对。随后,我们使用上下文一致的正样本和幻觉负样本迭代构建上下文感知的偏好数据。最后,我们采用上下文感知偏好损失(C-DPO)训练模型,该损失强调在幻觉最初显现的句子级别进行判别学习。实验结果表明,与原始模型相比,SENTINEL能够减少超过90%的幻觉,并在幻觉基准测试和通用能力基准测试上均优于先前的最先进方法,展现了其优越性和泛化能力。模型、数据集和代码可在https://github.com/pspdada/SENTINEL获取。
English
Multimodal large language models (MLLMs) have revolutionized cross-modal
understanding but continue to struggle with hallucinations - fabricated content
contradicting visual inputs. Existing hallucination mitigation methods either
incur prohibitive computational costs or introduce distribution mismatches
between training data and model outputs. We identify a critical insight:
hallucinations predominantly emerge at the early stages of text generation and
propagate through subsequent outputs. To address this, we propose **SENTINEL**
(**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain
pr**E**ference **L**earning), a framework that eliminates dependency on human
annotations. Specifically, we first bootstrap high-quality in-domain preference
pairs by iteratively sampling model outputs, validating object existence
through cross-checking with two open-vocabulary detectors, and classifying
sentences into hallucinated/non-hallucinated categories. Subsequently, we use
context-coherent positive samples and hallucinated negative samples to build
context-aware preference data iteratively. Finally, we train models using a
context-aware preference loss (C-DPO) that emphasizes discriminative learning
at the sentence level where hallucinations initially manifest. Experimental
results show that SENTINEL can reduce hallucinations by over 90\% compared to
the original model and outperforms the previous state-of-the-art method on both
hallucination benchmarks and general capabilities benchmarks, demonstrating its
superiority and generalization ability. The models, datasets, and code are
available at https://github.com/pspdada/SENTINEL.