환각 현상은 신뢰를 훼손한다; 메타인지가 해결책이다

초록

사실적 신뢰도에서 상당한 진전이 있었음에도 환각(hallucination)이라 불리는 오류는 생성형 AI의 주요 문제로 남아 있으며, 특히 LLM이 더 복잡하거나 미묘한 환경에서 유용할 것이라는 기대가 커지면서 더욱 그러하다. 그러나 가장 단순한 설정, 즉 명확한 기준 진실이 있는 사실형 질의응답에서도 외부 도구를 사용하지 않는 최첨단 모델들은 여전히 환각을 보인다. 우리는 이 분야에서 대부분의 사실성 향상이 모델의 인식 경계 인식(알려진 것과 알려지지 않은 것을 구분하는 것)을 개선하기보다는 모델의 지식 경계를 확장(더 많은 사실을 인코딩)하는 데서 비롯되었다고 주장한다. 우리는 후자가 본질적으로 어렵다고 추측한다. 모델은 진실과 오류를 완벽하게 분리할 수 있는 판별력을 결여할 수 있으며, 이는 환각을 제거하는 것과 유용성을 보존하는 것 사이의 불가피한 상충 관계를 만들어낸다. 이 상충 관계는 다른 관점에서는 사라진다. 만약 환각을 확신을 갖고 제시된 오류, 즉 적절한 조건 없이 전달된 부정확한 정보로 이해한다면, 답변하거나 삼가는 이분법을 넘어선 제3의 길, 즉 불확실성을 표현하는 방법이 나타난다. 우리는 신뢰할 수 있는 불확실성(faithful uncertainty), 즉 언어적 불확실성을 내재적 불확실성과 일치시키는 것을 제안한다. 이것은 메타인지(자신의 불확실성을 인지하고 이에 따라 행동하는 능력)의 한 측면이다. 직접 상호작용에서는 불확실성에 따른 행동이 이를 정직하게 전달하는 것을 의미하며, 에이전트 시스템에서는 검색 시기와 신뢰 대상을 통제하는 제어 계층이 된다. 따라서 메타인지는 LLM이 신뢰할 수 있고 유능하기 위해 필수적이며, 우리는 이 목표를 향한 진전을 위한 미해결 과제를 강조하며 결론을 맺는다.

English

Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors -- incorrect information delivered without appropriate qualification -- a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition -- the ability to be aware of one's own uncertainty and to act on it. For direct interaction, acting on uncertainty means communicating it honestly; for agentic systems, it becomes the control layer governing when to search and what to trust. Metacognition is thus essential for LLMs to be both trustworthy and capable; we conclude by highlighting open problems for progress towards this objective.

환각 현상은 신뢰를 훼손한다; 메타인지가 해결책이다

Hallucinations Undermine Trust; Metacognition is a Way Forward

초록

Support