ChatPaper.aiChatPaper

語言模型幻覺如何可能滾雪球

How Language Model Hallucinations Can Snowball

May 22, 2023
作者: Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
cs.AI

摘要

在實際應用中使用語言模型的一個主要風險是它們容易產生不正確的陳述。幻覺通常被歸因於語言模型中的知識缺口,但我們假設在某些情況下,當語言模型為先前生成的幻覺提供理由時,它們會輸出虛假聲明,而這些聲明它們可以單獨辨認為不正確。我們建立了三個問答數據集,在這些數據集中,ChatGPT 和 GPT-4 經常給出不正確答案,並提供至少一個不正確聲明的解釋。重要的是,我們發現 ChatGPT 和 GPT-4 分別能夠識別出自己的錯誤的百分比分別為 67% 和 87%。我們將這一現象稱為幻覺雪球效應:語言模型對早期錯誤過度承諾,導致它做出更多本來不會犯的錯誤。
English
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously generated hallucinations, LMs output false claims that they can separately recognize as incorrect. We construct three question-answering datasets where ChatGPT and GPT-4 often state an incorrect answer and offer an explanation with at least one incorrect claim. Crucially, we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes, respectively. We refer to this phenomenon as hallucination snowballing: an LM over-commits to early mistakes, leading to more mistakes that it otherwise would not make.
PDF30December 15, 2024