大規模言語モデルは事前学習中にどのように事実知識を獲得するのか？

要旨

大規模言語モデル（LLM）が大量の事実知識を保持できることが最近観察されているにもかかわらず、それらが事前学習を通じてどのように事実知識を獲得するかについてのメカニズムは十分に理解されていません。本研究は、LLMが事前学習中にどのように事実知識を獲得するかを調査することで、このギャップを埋めることを目指しています。その結果、事前学習中の事実知識獲得のダイナミクスに関するいくつかの重要な知見が明らかになりました。まず、直感に反して、より多くのデータで事前学習を行っても、モデルの事実知識を獲得し維持する能力に有意な向上は見られませんでした。次に、訓練ステップ数と、記憶および事実知識の一般化の忘却との間にはべき乗則の関係があり、重複した訓練データで訓練されたLLMはより速く忘却する傾向があります。第三に、より大きなバッチサイズでLLMを訓練することで、モデルの忘却に対する頑健性を向上させることができます。全体として、我々の観察は、LLMの事前学習における事実知識の獲得が、事前学習データに含まれる事実知識の確率を各ステップで徐々に増加させることによって起こることを示唆しています。しかし、この増加はその後の忘却によって希釈されます。この解釈に基づいて、我々は、LLMの長尾知識に対するパフォーマンスの低さや、事前学習コーパスの重複排除の利点など、最近観察されたLLMの振る舞いに対して説得力のある説明を提供できることを示します。

English

Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining. First, counterintuitively, we observe that pretraining on more data shows no significant improvement in the model's capability to acquire and maintain factual knowledge. Next, there is a power-law relationship between training steps and forgetting of memorization and generalization of factual knowledge, and LLMs trained with duplicated training data exhibit faster forgetting. Third, training LLMs with larger batch sizes can enhance the models' robustness to forgetting. Overall, our observations suggest that factual knowledge acquisition in LLM pretraining occurs by progressively increasing the probability of factual knowledge presented in the pretraining data at each step. However, this increase is diluted by subsequent forgetting. Based on this interpretation, we demonstrate that we can provide plausible explanations for recently observed behaviors of LLMs, such as the poor performance of LLMs on long-tail knowledge and the benefits of deduplicating the pretraining corpus.

大規模言語モデルは事前学習中にどのように事実知識を獲得するのか？

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

要旨

Support