幻觉侵蚀信任；元认知乃破局之道

摘要

尽管生成式AI在事实可靠性方面已取得显著进展，但被称为"幻觉"的误差仍是主要隐患——尤其是在大语言模型被寄予厚望应用于更复杂或微妙场景的背景下。即便在最简单的场景中（如存在明确事实基准的简答式问答），未借助外部工具的前沿模型仍会产生幻觉。我们认为，该领域事实性提升主要源于模型知识边界的扩展（编码更多事实），而非边界认知能力的增强（区分已知与未知）。我们推测后者存在固有难度：模型可能缺乏完全区分真理与谬误的判别力，这就在消除幻觉与保持效用之间形成了不可避免的权衡。这种权衡在另一种框架下可迎刃而解。若将幻觉理解为自信型错误——即未经适当限定便输出的错误信息，我们就能在"回答或放弃"的二元选择之外找到第三条路径：表达不确定性。我们提出"可信不确定性"概念，即让语言表达的不确定性与内在不确定性保持一致。这是元认知能力的一个侧面——即感知自身不确定性并据此采取行动的能力。在直接交互场景中，基于不确定性行动意味着如实传递不确定性；对于智能体系统而言，它则成为控制层，决定何时进行信息检索以及信任何种信息。因此，元认知是实现大语言模型可信性与能力兼备的关键；最后我们指出了实现该目标亟待解决的核心问题。

English

Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors -- incorrect information delivered without appropriate qualification -- a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition -- the ability to be aware of one's own uncertainty and to act on it. For direct interaction, acting on uncertainty means communicating it honestly; for agentic systems, it becomes the control layer governing when to search and what to trust. Metacognition is thus essential for LLMs to be both trustworthy and capable; we conclude by highlighting open problems for progress towards this objective.

幻觉侵蚀信任；元认知乃破局之道

Hallucinations Undermine Trust; Metacognition is a Way Forward

摘要

Support