LLMの回路分析は、トレーニングとスケールにわたって一貫している

要旨

現在展開されている大規模言語モデル（LLM）の多くは、継続的なトレーニングや追加のファインチューニングを受けています。一方、LLMの内部メカニズムに関する研究のほとんどは、ある一時点（事前学習の終了時）のモデルに焦点を当てており、その結果が現実世界の設定に一般化するかどうかという疑問が生じます。時間経過に伴うメカニズムに関する既存の研究は、エンコーダのみのモデルやトイモデルに焦点を当てており、これらは展開されているモデルとは大きく異なります。本研究では、デコーダのみのLLMにおいて、7000万から28億パラメータまでのモデルを対象に、3000億トークンのトレーニングを通じて、回路として操作化されたモデルのメカニズムがどのように出現し進化するかを追跡します。その結果、タスク能力とそれを支える機能コンポーネントは、スケールを問わず類似したトークン数で一貫して出現することがわかりました。さらに、そのようなコンポーネントは時間とともに異なるアテンションヘッドによって実装される可能性がありますが、それらが実装する全体的なアルゴリズムは維持されます。驚くべきことに、これらのアルゴリズムとそれに関与するコンポーネントのタイプは、モデルのスケールを超えて複製される可能性があります。これらの結果は、事前学習の終了時に小さなモデルで行われた回路分析が、追加の事前学習後やモデルのスケールを超えても適用可能な洞察を提供できることを示唆しています。

English

Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the question of whether their results generalize to real-world settings. Existing studies of mechanisms over time focus on encoder-only or toy models, which differ significantly from most deployed models. In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. We find that task abilities and the functional components that support them emerge consistently at similar token counts across scale. Moreover, although such components may be implemented by different attention heads over time, the overarching algorithm that they implement remains. Surprisingly, both these algorithms and the types of components involved therein can replicate across model scale. These results suggest that circuit analyses conducted on small models at the end of pre-training can provide insights that still apply after additional pre-training and over model scale.

LLMの回路分析は、トレーニングとスケールにわたって一貫している

LLM Circuit Analyses Are Consistent Across Training and Scale

要旨

Support