ChatPaper.aiChatPaper

幻觉卫士:揭秘LLM中数据驱动与推理驱动型幻觉生成机制

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

January 26, 2026
作者: Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou
cs.AI

摘要

在医疗、法律及科学发现等高风险领域,大型语言模型的可靠性常因幻觉问题而受到制约。这些错误通常源于两大因素:数据驱动型幻觉与推理驱动型幻觉。然而现有检测方法往往仅针对单一诱因,且依赖任务特定的启发式规则,限制了其在复杂场景中的泛化能力。为突破这些局限,我们提出"幻觉风险边界"理论框架,将幻觉风险形式化分解为数据驱动与推理驱动两个组成部分,分别对应训练阶段的分布失配与推理阶段的不稳定性,为分析幻觉产生与演化机制提供了理论基石。基于此,我们开发了HalluGuard检测方法——通过神经正切核诱导的几何结构与表征空间,构建能同时识别两类幻觉的NTK评分体系。我们在10个多样化基准测试、11个竞争性基线模型及9个主流LLM架构上进行评估,结果表明HalluGuard在检测多种LLM幻觉形式时均能保持最先进的性能水平。
English
The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.
PDF11January 28, 2026