大型語言模型是局部線性映射

摘要

我們證明，在不修改模型權重或改變輸出預測的情況下，可以將多個開源權重的大型語言模型（LLMs）的推理操作映射到一個完全等價的線性系統，用於處理輸入序列。借鑒圖像擴散模型中表現出的局部或分段線性技術，我們策略性地改變了針對給定輸入序列的梯度計算，使得模型的雅可比矩陣幾乎精確地再現了前向預測，形成一個線性系統。我們在多個模型（包括Llama 3、Gemma 3、Qwen 3、Phi 4、Mistral Ministral和OLMo 2，直至Llama 3.3 70B Q4）上展示了這一方法，並通過分離雅可比矩陣的奇異值分解表明，這些LLMs在極低維的子空間中運作，其中許多最大的奇異向量解碼出與最可能輸出詞彙相關的概念。此方法還使我們能夠將每一連續層（及其注意力機制和多層感知機組件）的操作視為近乎精確的線性系統，並觀察語義概念的湧現。儘管現代LLMs具有強大的表達能力和全局非線性，但通過近乎精確的局部線性分解，我們可以解釋它們的內部表示，並在下一詞彙預測過程中揭示可解釋的語義結構。

English

We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

大型語言模型是局部線性映射

Large Language Models are Locally Linear Mappings

摘要

Support