TruthPrInt:通過潛在真實性引導的預干預緩解LVLM物體幻覺
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
March 13, 2025
作者: Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu
cs.AI
摘要
物件幻覺(Object Hallucination, OH)已被公認為大型視覺語言模型(Large Vision-Language Models, LVLMs)中主要的可信度挑戰之一。近期大型語言模型(Large Language Models, LLMs)的進展表明,內部狀態(如隱藏狀態)編碼了生成回應的「整體真實性」。然而,LVLMs中的內部狀態如何運作,以及它們是否能夠作為「逐詞元」的幻覺指標,這對於緩解OH至關重要,目前仍未被充分探索。本文首先深入探討了LVLM內部狀態與OH問題的關聯,並發現:(1) LVLM內部狀態是幻覺行為的高特異性逐詞元指標。此外,(2) 不同的LVLMs在共同的潛在子空間中編碼了幻覺的普遍模式,這表明存在多種LVLMs共享的「通用真實方向」。基於這些發現,我們提出了真實引導預干預(Truthful-Guided Pre-Intervention, TruthPrInt),該方法首先學習LVLM解碼的真實方向,然後在LVLM解碼過程中應用真實引導的推理時間干預。我們進一步提出了ComnHallu,通過構建和對齊幻覺潛在子空間來增強跨LVLM和跨數據的幻覺檢測可轉移性。我們在多種實驗設置中評估了TruthPrInt,包括域內和域外場景,並在流行的LVLMs和OH基準上進行了測試。實驗結果表明,TruthPrInt顯著優於現有的最先進方法。代碼將在https://github.com/jinhaoduan/TruthPrInt上提供。
English
Object Hallucination (OH) has been acknowledged as one of the major
trustworthy challenges in Large Vision-Language Models (LVLMs). Recent
advancements in Large Language Models (LLMs) indicate that internal states,
such as hidden states, encode the "overall truthfulness" of generated
responses. However, it remains under-explored how internal states in LVLMs
function and whether they could serve as "per-token" hallucination indicators,
which is essential for mitigating OH. In this paper, we first conduct an
in-depth exploration of LVLM internal states in relation to OH issues and
discover that (1) LVLM internal states are high-specificity per-token
indicators of hallucination behaviors. Moreover, (2) different LVLMs encode
universal patterns of hallucinations in common latent subspaces, indicating
that there exist "generic truthful directions" shared by various LVLMs. Based
on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt)
that first learns the truthful direction of LVLM decoding and then applies
truthful-guided inference-time intervention during LVLM decoding. We further
propose ComnHallu to enhance both cross-LVLM and cross-data hallucination
detection transferability by constructing and aligning hallucination latent
subspaces. We evaluate TruthPrInt in extensive experimental settings, including
in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks.
Experimental results indicate that TruthPrInt significantly outperforms
state-of-the-art methods. Codes will be available at
https://github.com/jinhaoduan/TruthPrInt.Summary
AI-Generated Summary