幻覺侵蝕信任；後設認知是前進之道

摘要

儘管生成式AI在事實可靠性方面已取得顯著進展，但被稱為「幻覺」的錯誤仍然是主要隱患——尤其是在大型語言模型被期待於更複雜或細膩場景中提供協助的背景下。然而即便在最簡單的設定中（存在明確事實基準的簡短問答），即使不使用外部工具的前沿模型仍持續產生幻覺。我們認為，該領域大多數事實性提升源自擴展模型的知識邊界（編碼更多事實），而非增強對邊界的認知（區分已知與未知）。我們推測後者本質上更困難：模型可能缺乏完美區分真理與錯誤的判別力，這在消除幻覺與保持實用性之間形成了不可避免的權衡。這種權衡在另一種框架下可迎刃而解。若將幻覺理解為「自信的錯誤」——即未經適當限定便輸出的不正確信息——那麼在「回答或放棄」二元選擇之外會出現第三條路徑：表達不確定性。我們提出「忠實不確定性」概念：使語言不確定性與內在不確定性保持一致。這是元認知能力的一個側面，即感知自身不確定性並據此行動的能力。對直接交互而言，基於不確定性行動意味著誠實傳達它；對智能體系統而言，這成為控制層面，決定何時檢索信息及信任何種內容。因此，元認知對大型語言模型實現可信與能力兼備至關重要；文末我們將指出實現此目標尚待解決的關鍵問題。

English

Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors -- incorrect information delivered without appropriate qualification -- a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition -- the ability to be aware of one's own uncertainty and to act on it. For direct interaction, acting on uncertainty means communicating it honestly; for agentic systems, it becomes the control layer governing when to search and what to trust. Metacognition is thus essential for LLMs to be both trustworthy and capable; we conclude by highlighting open problems for progress towards this objective.

幻覺侵蝕信任；後設認知是前進之道

Hallucinations Undermine Trust; Metacognition is a Way Forward

摘要

Support