MLLM 可以看见吗？用于幻觉缓解的动态校正解码

摘要

多模态大型语言模型（MLLMs）经常表现出幻觉现象，但其根本原因仍然知之甚少。在本文中，我们进行了实证分析发现，虽然 MLLMs 在最终输出中错误地生成了对象，但它们实际上能够识别前面层中的视觉对象。我们推测这可能是由于语言模型的强知识先验抑制了视觉信息，导致了幻觉。受此启发，我们提出了一种新颖的用于 MLLMs 的动态校正解码方法（DeCo），该方法自适应地选择适当的前置层，并将知识比例地整合到最终层以调整输出的 logit 值。需要注意的是，DeCo 是模型无关的，可以无缝地与各种经典解码策略结合，并应用于不同的 MLLMs。我们在广泛使用的基准测试上评估了 DeCo，结果表明与基线相比，它能够大幅减少幻觉率，突显了其减轻幻觉的潜力。代码可在 https://github.com/zjunlp/DeCo 获取。

English

Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. We speculate that this may be due to the strong knowledge priors of the language model suppressing the visual information, leading to hallucinations. Motivated by this, we propose a novel dynamic correction decoding method for MLLMs (DeCo), which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits. Note that DeCo is model agnostic and can be seamlessly incorporated with various classic decoding strategies and applied to different MLLMs. We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines, highlighting its potential to mitigate hallucinations. Code is available at https://github.com/zjunlp/DeCo.

MLLM 可以看见吗？用于幻觉缓解的动态校正解码

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

摘要

Support