解释和编辑视觉-语言表示以减轻幻觉
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
October 3, 2024
作者: Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman
cs.AI
摘要
我们研究了视觉-语言模型(VLMs)的内部表示,以解决幻觉问题,尽管模型规模和训练取得了进展,但幻觉问题仍然是一个持久的挑战。我们将VLMs的内部图像表示投影到它们的语言词汇,并观察到对于真实对象,输出概率比幻觉对象更加自信。我们另外利用这些输出概率来空间定位真实对象。基于这种方法,我们引入了一种知识消除算法,通过将图像特征与幻觉对象特征正交化,从而消除幻觉。我们展示了对模型的潜在表示进行有针对性的编辑可以在COCO2014数据集上将幻觉减少高达25.7%,同时保持性能。我们的研究结果表明,对VLMs的潜在表示有更深入的理解可以增强可靠性,并实现新的能力,比如零样本分割。
English
We investigate the internal representations of vision-language models (VLMs)
to address hallucinations, a persistent challenge despite advances in model
size and training. We project VLMs' internal image representations to their
language vocabulary and observe more confident output probabilities on real
objects than hallucinated objects. We additionally use these output
probabilities to spatially localize real objects. Building on this approach, we
introduce a knowledge erasure algorithm that removes hallucinations by linearly
orthogonalizing image features with respect to hallucinated object features. We
show that targeted edits to a model's latent representations can reduce
hallucinations by up to 25.7% on the COCO2014 dataset while preserving
performance. Our findings demonstrate how a deeper understanding of VLMs'
latent representations can enhance reliability and enable novel capabilities,
such as zero-shot segmentation.Summary
AI-Generated Summary