LLM电路分析在训练和规模上保持一致。

摘要

目前大多数部署的大型语言模型（LLMs）都经历持续训练或额外微调。相比之下，大多数关于LLMs内部机制的研究集中在某个时间点的模型（即预训练结束时），这引发了一个问题，即它们的结果是否适用于实际环境。现有关于随时间推移的机制的研究集中在仅编码器或玩具模型上，这些模型与大多数部署模型有显著差异。在本研究中，我们追踪了解码器-仅LLMs在训练3000亿个标记的过程中如何形成和演变的模型机制，这些模型的参数范围从7000万到28亿个。我们发现任务能力及其支持它们的功能组件在不同规模下的标记数上始终一致地出现。此外，尽管这些组件可能随时间由不同的注意力头实现，但它们实现的总体算法保持不变。令人惊讶的是，这些算法和其中涉及的组件类型都可以在模型规模上复制。这些结果表明，在预训练结束时对小型模型进行的电路分析可以提供仍然适用于额外预训练和模型规模的见解。

English

Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the question of whether their results generalize to real-world settings. Existing studies of mechanisms over time focus on encoder-only or toy models, which differ significantly from most deployed models. In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. We find that task abilities and the functional components that support them emerge consistently at similar token counts across scale. Moreover, although such components may be implemented by different attention heads over time, the overarching algorithm that they implement remains. Surprisingly, both these algorithms and the types of components involved therein can replicate across model scale. These results suggest that circuit analyses conducted on small models at the end of pre-training can provide insights that still apply after additional pre-training and over model scale.

LLM电路分析在训练和规模上保持一致。

LLM Circuit Analyses Are Consistent Across Training and Scale

摘要

Support