从循环到失误：语言模型在不确定性下的回退行为

摘要

大型语言模型（LLMs）经常表现出不良行为，例如幻觉和序列重复。我们建议将这些行为视为模型在不确定性下表现出的后备方案，并研究它们之间的联系。我们将后备行为分类为序列重复、退化文本和幻觉，并在来自同一系列但在预训练标记数量、参数数量或包含遵循指令训练方面有所不同的模型中对它们进行了广泛分析。我们的实验揭示了后备行为的明确且一致的排序，跨越所有这些维度：LLM越先进（即在更多标记上训练，具有更多参数或经过指令调整），其后备行为从序列重复转变为退化文本，然后是幻觉。此外，即使对于表现最佳的模型，在单个生成过程中也观察到相同的排序；随着不确定性的增加，模型从生成幻觉转变为产生退化文本，然后是序列重复。最后，我们证明了，尽管常见的解码技术（例如随机抽样）可能减轻一些不良行为，如序列重复，但它们会增加更难检测到的幻觉。

English

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

从循环到失误：语言模型在不确定性下的回退行为

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

摘要

Support