TruthPrInt：潜在的な真実誘導型事前介入によるLVLMの物体幻覚の軽減

要旨

オブジェクトハルシネーション（OH）は、大規模視覚言語モデル（LVLM）における主要な信頼性課題の一つとして認識されてきた。近年の大規模言語モデル（LLM）の進展により、隠れ状態などの内部状態が生成された応答の「全体的な真実性」をエンコードしていることが示されている。しかし、LVLMの内部状態がどのように機能し、それらが「トークンごと」のハルシネーション指標として機能し得るかについては、まだ十分に検討されていない。これはOHを軽減する上で重要な課題である。本論文では、まずOH問題に関連するLVLMの内部状態について詳細な探索を行い、(1) LVLMの内部状態がハルシネーション行動の高特異性トークン指標であることを発見した。さらに、(2) 異なるLVLMが共通の潜在部分空間にハルシネーションの普遍的なパターンをエンコードしており、様々なLVLM間で共有される「一般的な真実方向」が存在することを示した。これらの発見に基づき、我々はTruthful-Guided Pre-Intervention（TruthPrInt）を提案する。これはまずLVLMデコーディングの真実方向を学習し、その後LVLMデコーディング中に真実誘導型推論時介入を適用するものである。さらに、ハルシネーション潜在部分空間を構築・整列させることで、クロスLVLMおよびクロスデータハルシネーション検出の転移性を強化するComnHalluを提案する。TruthPrIntを、ドメイン内およびドメイン外のシナリオを含む広範な実験設定で評価し、人気のあるLVLMとOHベンチマークで検証した。実験結果は、TruthPrIntが最先端の手法を大幅に上回ることを示している。コードはhttps://github.com/jinhaoduan/TruthPrIntで公開予定である。

English

Object Hallucination (OH) has been acknowledged as one of the major trustworthy challenges in Large Vision-Language Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the "overall truthfulness" of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as "per-token" hallucination indicators, which is essential for mitigating OH. In this paper, we first conduct an in-depth exploration of LVLM internal states in relation to OH issues and discover that (1) LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, (2) different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist "generic truthful directions" shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inference-time intervention during LVLM decoding. We further propose ComnHallu to enhance both cross-LVLM and cross-data hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.

TruthPrInt：潜在的な真実誘導型事前介入によるLVLMの物体幻覚の軽減

TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention

要旨

Support