大規模言語モデルはどのように新たな知識を獲得するのか？継続的事前学習における知識回路の視点から

要旨

知識集約型タスクにおいて卓越した能力を発揮するにもかかわらず、大規模言語モデル（LLMs）は、新しい知識をどのように内部化するか、特に獲得した知識をニューラル計算に構造的に埋め込む方法について、重要な理解のギャップに直面しています。私たちはこの問題を、知識回路の進化という観点から取り組み、知識の保存と処理を促進する計算サブグラフを特定しました。継続的な事前学習を通じた回路進化の体系的な分析により、いくつかの重要な発見が明らかになりました：（1）新しい知識の獲得は、既存の知識との関連性に影響を受ける；（2）知識回路の進化は、形成から最適化への明確なフェーズシフトを示す；（3）知識回路の進化は、深層から浅層へのパターンに従う。これらの知見は、LLMsにおける新しい知識獲得のメカニズムに関する理論的理解を進めるだけでなく、モデルのパフォーマンスを向上させるための継続的事前学習戦略の改善に潜在的な示唆を提供します。コードとデータはhttps://github.com/zjunlp/DynamicKnowledgeCircuitsで公開されます。

English

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

大規模言語モデルはどのように新たな知識を獲得するのか？継続的事前学習における知識回路の視点から

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

要旨

Support