從迴圈到錯誤：語言模型在不確定性下的後備行為

摘要

大型語言模型（LLMs）通常表現出不良行為，例如幻覺和序列重複。我們建議將這些行為視為模型在不確定情況下表現出的後備方案，並探討它們之間的聯繫。我們將後備行為歸類為序列重複、退化文本和幻覺，並在來自同一家族的模型中進行廣泛分析，這些模型在預訓練標記數量、參數數量或包含遵循指令的訓練方面存在差異。我們的實驗揭示了在所有這些軸上後備行為的清晰且一致的排序：LLM 越先進（即在更多標記上進行訓練、具有更多參數或調整指令），其後備行為從序列重複轉移到退化文本，然後再到幻覺。此外，即使對於表現最佳的模型，同樣的排序也在單個生成過程中觀察到；隨著不確定性的增加，模型從生成幻覺轉變為產生退化文本，然後是序列重複。最後，我們證明了，雖然常見的解碼技術（例如隨機抽樣）可能會緩解一些不需要的行為，如序列重複，但它們會增加更難檢測的幻覺。

English

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

從迴圈到錯誤：語言模型在不確定性下的後備行為

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

摘要

Support