LLM 회로 분석은 훈련과 규모에 걸쳐 일관성을 유지한다

초록

현재 배포된 대부분의 대규모 언어 모델(LLM)은 지속적인 훈련 또는 추가적인 파인튜닝을 거칩니다. 이와 대조적으로, LLM의 내부 메커니즘에 대한 대부분의 연구는 특정 시점(사전 훈련 종료 시점)의 모델에 초점을 맞추고 있어, 이러한 연구 결과가 실제 환경에서도 일반화될 수 있는지에 대한 의문을 제기합니다. 시간에 따른 메커니즘을 연구한 기존 연구들은 인코더 전용 모델이나 간단한 모델에 집중되어 있으며, 이는 대부분의 배포된 모델과 크게 다릅니다. 본 연구에서는 디코더 전용 LLM에서 7000만에서 28억 개의 파라미터를 가진 모델들을 대상으로, 3000억 개의 토큰에 걸친 훈련 과정에서 모델 메커니즘(회로로 개념화됨)이 어떻게 등장하고 진화하는지 추적합니다. 우리는 과제 능력과 이를 지원하는 기능적 구성 요소들이 규모에 관계없이 비슷한 토큰 수에서 일관되게 등장함을 발견했습니다. 더욱이, 이러한 구성 요소들이 시간이 지남에 따라 다른 어텐션 헤드에 의해 구현될 수 있지만, 그들이 구현하는 전체 알고리즘은 유지됩니다. 놀랍게도, 이러한 알고리즘과 관련된 구성 요소의 유형들은 모델 규모에 걸쳐 복제될 수 있습니다. 이러한 결과는 사전 훈련 종료 시점의 소규모 모델에서 수행된 회로 분석이 추가적인 사전 훈련 이후와 모델 규모에 걸쳐 여전히 적용 가능한 통찰력을 제공할 수 있음을 시사합니다.

English

Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the question of whether their results generalize to real-world settings. Existing studies of mechanisms over time focus on encoder-only or toy models, which differ significantly from most deployed models. In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. We find that task abilities and the functional components that support them emerge consistently at similar token counts across scale. Moreover, although such components may be implemented by different attention heads over time, the overarching algorithm that they implement remains. Surprisingly, both these algorithms and the types of components involved therein can replicate across model scale. These results suggest that circuit analyses conducted on small models at the end of pre-training can provide insights that still apply after additional pre-training and over model scale.

LLM 회로 분석은 훈련과 규모에 걸쳐 일관성을 유지한다

LLM Circuit Analyses Are Consistent Across Training and Scale

초록

Support