幻覚は信頼を損なう; メタ認知は前進への道

要旨

事実的正確性において大きな進歩が見られるにもかかわらず、「幻覚」と呼ばれる誤りは、特にLLMがより複雑で微妙な状況での支援が期待されるようになる中で、生成AIの主要な懸念材料であり続けています。しかし、外部ツールを用いない最先端モデルでさえ、明確な正解が存在する単純な事実質問応答という最も単純な設定においても、幻覚を生成し続けています。我々は、この領域での事実性の向上の大部分が、モデルの知識境界の拡大（より多くの事実の符号化）によるものであり、その境界に対する認識の改善（既知と未知の区別）によるものではないと主張します。後者が本質的に困難であると我々は推察します。モデルは真実と誤りを完全に分離する識別力を欠いており、幻覚の排除と有用性の維持の間に避けられないトレードオフが生じているのです。このトレードオフは、異なる枠組みの下では解消されます。幻覚を「自信過剰な誤り」、つまり適切な留保なしに提供される不正確な情報と理解すれば、「回答するか控えるか」という二分法を超えた第三の道、すなわち「不確実性の表明」が現れます。我々は「忠実な不確実性」、つまり言語的な不確実性を内在的な不確実性に一致させることを提案します。これはメタ認知——自身の不確実性を認識し、それに基づいて行動する能力——の一側面です。直接的な対話においては、不確実性に基づいて行動することは、それを正直に伝えることを意味します。エージェントシステムにおいては、それは「いつ検索すべきか」「何を信頼すべきか」を制御する層となります。したがって、LLMが信頼できかつ有能であるためにはメタ認知が不可欠です。本稿の結論として、この目標に向けた進歩のための未解決問題を指摘します。

English

Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors -- incorrect information delivered without appropriate qualification -- a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition -- the ability to be aware of one's own uncertainty and to act on it. For direct interaction, acting on uncertainty means communicating it honestly; for agentic systems, it becomes the control layer governing when to search and what to trust. Metacognition is thus essential for LLMs to be both trustworthy and capable; we conclude by highlighting open problems for progress towards this objective.

幻覚は信頼を損なう; メタ認知は前進への道

Hallucinations Undermine Trust; Metacognition is a Way Forward

要旨

Support