文レベルでの早期介入による物体幻覚の軽減

要旨

マルチモーダル大規模言語モデル（MLLMs）は、クロスモーダル理解に革命をもたらしたが、視覚的入力と矛盾する虚偽の内容、すなわち「幻覚」の問題に依然として苦戦している。既存の幻覚緩和手法は、計算コストが過大であるか、訓練データとモデル出力の間に分布の不一致を引き起こすかのいずれかの問題を抱えている。本研究では、幻覚が主にテキスト生成の初期段階で発生し、その後の出力に伝播するという重要な洞察を明らかにした。これを解決するため、人間のアノテーションに依存しないフレームワークである**SENTINEL**（**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning）を提案する。具体的には、まずモデル出力を反復的にサンプリングし、2つのオープン語彙検出器を用いてオブジェクトの存在をクロスチェックし、文を幻覚あり/なしのカテゴリに分類することで、高品質なドメイン内選好ペアをブートストラップする。次に、文脈に一貫した正例と幻覚を含む負例を用いて、文脈を考慮した選好データを反復的に構築する。最後に、幻覚が最初に現れる文レベルで識別学習を強調する文脈を考慮した選好損失（C-DPO）を用いてモデルを訓練する。実験結果は、SENTINELが元のモデルと比較して幻覚を90％以上削減し、幻覚ベンチマークおよび一般的な能力ベンチマークにおいて従来の最先端手法を上回ることを示しており、その優位性と汎化能力を実証している。モデル、データセット、およびコードはhttps://github.com/pspdada/SENTINELで公開されている。

English

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs. Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs. We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs. To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations. Specifically, we first bootstrap high-quality in-domain preference pairs by iteratively sampling model outputs, validating object existence through cross-checking with two open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, we use context-coherent positive samples and hallucinated negative samples to build context-aware preference data iteratively. Finally, we train models using a context-aware preference loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest. Experimental results show that SENTINEL can reduce hallucinations by over 90\% compared to the original model and outperforms the previous state-of-the-art method on both hallucination benchmarks and general capabilities benchmarks, demonstrating its superiority and generalization ability. The models, datasets, and code are available at https://github.com/pspdada/SENTINEL.

文レベルでの早期介入による物体幻覚の軽減

Mitigating Object Hallucinations via Sentence-Level Early Intervention

要旨

Support