불확실성 상황에서 언어 모델의 대체 행동: 루프에서 오류까지

초록

대형 언어 모델(LLMs)은 종종 환각(hallucination)과 시퀀스 반복(sequence repetition)과 같은 바람직하지 않은 행동을 보입니다. 우리는 이러한 행동을 모델이 불확실성 하에서 보이는 폴백(fallback)으로 간주하고, 이들 간의 연관성을 조사할 것을 제안합니다. 우리는 폴백 행동을 시퀀스 반복, 퇴화된 텍스트(degenerate text), 그리고 환각으로 분류하고, 사전 학습 토큰 수, 매개변수 수, 또는 지시 따르기 훈련(instruction-following training) 포함 여부가 다른 동일 계열의 모델에서 이를 광범위하게 분석합니다. 우리의 실험은 이러한 모든 축에 걸쳐 폴백 행동의 명확하고 일관된 순서를 보여줍니다: LLM이 더 발전할수록(즉, 더 많은 토큰으로 훈련되거나, 더 많은 매개변수를 가지거나, 지시 튜닝이 된 경우), 그 폴백 행동은 시퀀스 반복에서 퇴화된 텍스트로, 그리고 환각으로 이동합니다. 더욱이, 이 동일한 순서는 최고 성능의 모델에서도 단일 생성 과정 전반에 걸쳐 관찰됩니다; 불확실성이 증가함에 따라 모델은 환각을 생성하는 것에서 퇴화된 텍스트를 생성하고, 그 다음 시퀀스 반복을 생성하는 것으로 이동합니다. 마지막으로, 우리는 무작위 샘플링(random sampling)과 같은 일반적인 디코딩 기법이 시퀀스 반복과 같은 일부 원치 않는 행동을 완화할 수는 있지만, 탐지하기 더 어려운 환각을 증가시킨다는 것을 보여줍니다.

English

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

불확실성 상황에서 언어 모델의 대체 행동: 루프에서 오류까지

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

초록

Support