LLM 如何獲取新知識?對持續預訓練的知識迴路進行觀點。
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
February 16, 2025
作者: Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen
cs.AI
摘要
儘管大型語言模型(LLMs)在知識密集任務中具有卓越的能力,但它們在理解如何內化新知識,特別是如何在神經計算中結構性地嵌入獲得的知識方面存在一個關鍵差距。我們通過知識電路演化的角度來解決這個問題,識別促進知識存儲和處理的計算子圖。我們對持續預訓練期間電路演化的系統分析揭示了幾個關鍵發現:(1)新知識的獲取受其與現有知識的相關性影響;(2)知識電路的演化展現出從形成到優化的明顯相位轉變;(3)知識電路的演化遵循深到淺的模式。這些見解不僅推動了我們對LLMs中新知識獲取機制的理論理解,還為改進持續預訓練策略以提高模型性能提供了潛在啟示。代碼和數據將在https://github.com/zjunlp/DynamicKnowledgeCircuits 上提供。
English
Despite exceptional capabilities in knowledge-intensive tasks, Large Language
Models (LLMs) face a critical gap in understanding how they internalize new
knowledge, particularly how to structurally embed acquired knowledge in their
neural computations. We address this issue through the lens of knowledge
circuit evolution, identifying computational subgraphs that facilitate
knowledge storage and processing. Our systematic analysis of circuit evolution
throughout continual pre-training reveals several key findings: (1) the
acquisition of new knowledge is influenced by its relevance to pre-existing
knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase
shift from formation to optimization; (3) the evolution of knowledge circuits
follows a deep-to-shallow pattern. These insights not only advance our
theoretical understanding of the mechanisms of new knowledge acquisition in
LLMs, but also provide potential implications for improving continual
pre-training strategies to enhance model performance. Code and data will be
available at https://github.com/zjunlp/DynamicKnowledgeCircuits.Summary
AI-Generated Summary