Woodpecker：用于多模态大型语言模型的幻觉校正

摘要

幻觉是悬挂在快速发展的多模态大型语言模型（MLLMs）上的一大阴影，指生成的文本与图像内容不一致的现象。为了减轻幻觉，现有研究主要采用一种指导调整的方式，需要使用特定数据对模型进行重新训练。在本文中，我们开辟了一条不同的道路，引入了一种名为“啄木鸟（Woodpecker）”的无需训练的方法。就像啄木鸟治愈树木一样，它会从生成的文本中挑选出并纠正幻觉。具体而言，“啄木鸟”包括五个阶段：关键概念提取、问题制定、视觉知识验证、视觉主张生成和幻觉修正。以一种事后补救的方式实施，“啄木鸟”可以轻松适用于不同的MLLMs，并且通过访问五个阶段的中间输出具有可解释性。我们在定量和定性上评估了“啄木鸟”，展示了这一新范式的巨大潜力。在POPE基准测试中，我们的方法在准确性上比基线MiniGPT-4/mPLUG-Owl分别提高了30.66%/24.33%。源代码已发布在https://github.com/BradyFU/Woodpecker。

English

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.

Woodpecker：用于多模态大型语言模型的幻觉校正

Woodpecker: Hallucination Correction for Multimodal Large Language Models

摘要

Support