自动幻觉：用于视觉-语言模型的幻觉基准自动生成

摘要

大型视觉语言模型（LVLMs）会出现幻觉：图像中的某些上下文线索可能会触发语言模块对异常或假设对象的过度自信和错误推理。尽管已经开发了一些基准来调查LVLM幻觉，但它们主要依赖于手工制作的极端案例，其失败模式可能难以泛化，对其进行微调可能会削弱其有效性。这促使我们开发了第一个自动基准生成方法AUTOHALLUSION，它利用几种主要策略来创建多样化的幻觉示例。它通过探测LVLMs中的语言模块的上下文线索，并利用这些线索合成图像：（1）添加与上下文线索不符的对象；（2）对于两个共同出现的对象，保留一个并排除另一个；或者（3）移除与上下文线索密切相关的对象。然后生成基于图像的问题，其真实答案与语言模块之前的答案相矛盾。模型必须克服上下文偏见和干扰，以达到正确答案，而错误或不一致的答案则表明出现了幻觉。AUTOHALLUSION使我们能够以最低成本创建新的基准，从而克服了手工制作基准的脆弱性。它还揭示了常见的失败模式和原因，提供了检测、避免或控制幻觉的关键见解。对顶尖的LVLMs进行了全面评估，例如GPT-4V(ision)、Gemini Pro Vision、Claude 3和LLaVA-1.5，在AUTOHALLUSION的合成和真实世界数据集上显示出97.7%和98.7%的幻觉诱发成功率，为长期与幻觉作斗争铺平了道路。

English

Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine their validity. These motivate us to develop the first automatic benchmark generation approach, AUTOHALLUSION, that harnesses a few principal strategies to create diverse hallucination examples. It probes the language modules in LVLMs for context cues and uses them to synthesize images by: (1) adding objects abnormal to the context cues; (2) for two co-occurring objects, keeping one and excluding the other; or (3) removing objects closely tied to the context cues. It then generates image-based questions whose ground-truth answers contradict the language module's prior. A model has to overcome contextual biases and distractions to reach correct answers, while incorrect or inconsistent answers indicate hallucinations. AUTOHALLUSION enables us to create new benchmarks at the minimum cost and thus overcomes the fragility of hand-crafted benchmarks. It also reveals common failure patterns and reasons, providing key insights to detect, avoid, or control hallucinations. Comprehensive evaluations of top-tier LVLMs, e.g., GPT-4V(ision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and 98.7% success rate of hallucination induction on synthetic and real-world datasets of AUTOHALLUSION, paving the way for a long battle against hallucinations.

自动幻觉：用于视觉-语言模型的幻觉基准自动生成

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

摘要

Support