谎言解构：追踪视觉语言模型幻觉的多阶段诊断框架

摘要

视觉语言模型(VLMs)常出现"幻觉"现象——生成看似合理但事实错误的陈述，这对其可信部署构成关键障碍。本研究提出一种诊断幻觉的新范式，将其从静态输出错误重新定义为模型计算认知的动态病理特征。我们的框架基于计算理性原则，将VLM的生成过程建模为动态认知轨迹。我们设计了一套信息论探针，将该轨迹投影至可解释的低维认知状态空间。核心发现是名为"几何-信息对偶性"的支配原理：认知轨迹在空间中的几何异常性本质上等价于其高信息论惊异值。幻觉检测由此转化为几何异常检测问题。在多样化场景下的评估——从严谨的二元问答(POPE)和综合推理(MME)到无约束开放描述(MS-COCO)——表明本框架实现了最先进性能。关键的是，该方案在弱监督下高效运行，即使校准数据严重污染仍保持强鲁棒性。该方法支持对故障进行因果归因，将可观测错误映射至不同病理状态：感知不稳定性（通过感知熵度量）、逻辑因果失效（通过推理冲突度量）以及决策模糊性（通过决策熵度量）。这最终为构建具有透明、可审计、可诊断特性的AI系统开辟了新路径。

English

Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.

谎言解构：追踪视觉语言模型幻觉的多阶段诊断框架

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

摘要

Support