LLM 如何獲取新知識？對持續預訓練的知識迴路進行觀點。

摘要

儘管大型語言模型（LLMs）在知識密集任務中具有卓越的能力，但它們在理解如何內化新知識，特別是如何在神經計算中結構性地嵌入獲得的知識方面存在一個關鍵差距。我們通過知識電路演化的角度來解決這個問題，識別促進知識存儲和處理的計算子圖。我們對持續預訓練期間電路演化的系統分析揭示了幾個關鍵發現：（1）新知識的獲取受其與現有知識的相關性影響；（2）知識電路的演化展現出從形成到優化的明顯相位轉變；（3）知識電路的演化遵循深到淺的模式。這些見解不僅推動了我們對LLMs中新知識獲取機制的理論理解，還為改進持續預訓練策略以提高模型性能提供了潛在啟示。代碼和數據將在https://github.com/zjunlp/DynamicKnowledgeCircuits 上提供。

English

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

LLM 如何獲取新知識？對持續預訓練的知識迴路進行觀點。

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

摘要

Support