LLM電路分析在訓練和規模上保持一致。

摘要

目前大多數部署的大型語言模型（LLMs）都經歷持續的訓練或額外的微調。相比之下，大多數關於LLMs內部機制的研究集中在某一時間點的模型（預訓練結束時），這引發了一個問題，即這些結果是否適用於現實世界的情境。現有對隨時間變化的機制的研究主要集中在僅有編碼器或玩具模型上，這些模型與大多數部署的模型有顯著差異。在本研究中，我們追蹤解碼器專用LLMs在訓練 3000 億標記的過程中，從 7000 萬到 28 億參數的模型中，模型機制如何形成和演變。我們發現任務能力及支持它們的功能組件在不同規模下的標記數出現一致。此外，儘管這些組件可能隨時間由不同的注意力頭實現，但它們實現的主要算法保持不變。令人驚訝的是，這些算法和其中涉及的組件類型可以在模型規模上複製。這些結果表明，在預訓練結束時對小型模型進行的電路分析仍然適用於額外預訓練和模型規模的情況。

English

Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the question of whether their results generalize to real-world settings. Existing studies of mechanisms over time focus on encoder-only or toy models, which differ significantly from most deployed models. In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. We find that task abilities and the functional components that support them emerge consistently at similar token counts across scale. Moreover, although such components may be implemented by different attention heads over time, the overarching algorithm that they implement remains. Surprisingly, both these algorithms and the types of components involved therein can replicate across model scale. These results suggest that circuit analyses conducted on small models at the end of pre-training can provide insights that still apply after additional pre-training and over model scale.

LLM電路分析在訓練和規模上保持一致。

LLM Circuit Analyses Are Consistent Across Training and Scale

摘要

Support