언어 모델이 환각을 일으키는 이유

초록

어려운 시험 문제를 마주한 학생들처럼, 대형 언어 모델은 불확실할 때 추측을 하며, 불확실성을 인정하는 대신 그럴듯하지만 잘못된 진술을 생성하기도 합니다. 이러한 "환각(hallucination)" 현상은 최첨단 시스템에서도 지속되며 신뢰를 훼손합니다. 우리는 언어 모델이 환각을 일으키는 이유가 불확실성을 인정하는 것보다 추측을 보상하는 훈련 및 평가 절차 때문이라고 주장하며, 현대 훈련 파이프라인에서 환각의 통계적 원인을 분석합니다. 환각은 신비로운 현상이 아닙니다 — 이는 단순히 이진 분류의 오류에서 비롯됩니다. 잘못된 진술과 사실을 구별할 수 없다면, 사전 훈련된 언어 모델에서 환각은 자연스러운 통계적 압력으로 발생할 것입니다. 또한, 대부분의 평가 방식이 채점되는 방식 때문에 환각이 지속된다고 주장합니다. 언어 모델은 좋은 시험 응시자가 되도록 최적화되며, 불확실할 때 추측하는 것이 시험 성적을 향상시킵니다. 이러한 불확실한 응답을 처벌하는 "전염병"은 사회-기술적 완화를 통해서만 해결될 수 있습니다: 리더보드를 지배하지만 잘못 정렬된 기존 벤치마크의 채점 방식을 수정하는 것이 추가적인 환각 평가를 도입하는 것보다 효과적입니다. 이러한 변화는 더 신뢰할 수 있는 AI 시스템으로의 전환을 이끌 수 있을 것입니다.

English

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious -- they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded -- language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This "epidemic" of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

언어 모델이 환각을 일으키는 이유

Why Language Models Hallucinate

초록

Support