なぜ言語モデルは幻覚を起こすのか

要旨

難しい試験問題に直面する学生のように、大規模言語モデルも不確実な状況では推測を行うことがあり、不確実性を認める代わりに、もっともらしいが誤った記述を生成することがある。このような「幻覚（hallucination）」現象は、最先端のシステムにおいても持続し、信頼性を損なう。我々は、言語モデルが幻覚を起こす理由として、訓練と評価のプロセスが不確実性を認めるよりも推測を奨励するためであると主張し、現代の訓練パイプラインにおける幻覚の統計的原因を分析する。幻覚は必ずしも神秘的なものではなく、単に二値分類の誤りとして発生する。誤った記述が事実と区別できない場合、事前訓練された言語モデルにおいて幻覚は自然な統計的圧力によって生じる。さらに、幻覚が持続する理由として、ほとんどの評価が採点される方法に起因すると主張する。言語モデルは良いテスト受験者となるように最適化されており、不確実な状況での推測がテストの成績を向上させる。この「不確実な回答を罰する」という「流行病」は、リーダーボードを支配しているが誤った方向性を持つ既存のベンチマークの採点方法を変更するという、社会技術的な緩和策によってのみ対処できる。この変更により、より信頼性の高いAIシステムに向けた分野の舵取りが可能となるかもしれない。

English

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious -- they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded -- language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This "epidemic" of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

なぜ言語モデルは幻覚を起こすのか

Why Language Models Hallucinate

要旨

Support