大语言模型是局部线性映射

摘要

我们证明，在不修改模型权重或改变输出预测的前提下，多个开源权重的大型语言模型（LLMs）的推理操作可以映射到一个与输入序列完全等价的线性系统。借鉴图像扩散模型中展现的局部或分段线性技术，我们策略性地调整了针对给定输入序列的梯度计算，用于下一个词预测，使得模型的雅可比矩阵几乎精确地通过一个线性系统再现前向预测。我们在多个模型（包括Llama 3、Gemma 3、Qwen 3、Phi 4、Mistral Ministral和OLMo 2，直至Llama 3.3 70B Q4）上验证了这一方法，并通过分离雅可比矩阵的奇异值分解展示了这些LLMs在极低维子空间中运行，其中许多最大的奇异向量解码为与最可能输出词相关的概念。此方法还使我们能够将每一连续层（及其注意力机制和MLP组件）的操作视为近乎精确的线性系统，并观察语义概念的形成。尽管现代LLMs具有强大的表达能力和全局非线性，但通过近乎精确的局部线性分解，我们可以解读其内部表示，并在下一个词预测过程中揭示可解释的语义结构。

English

We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

大语言模型是局部线性映射

Large Language Models are Locally Linear Mappings

摘要

Support