大規模言語モデルは局所的に線形な写像である

要旨

いくつかのオープンウェイト大規模言語モデル（LLM）の推論操作が、モデルの重みを変更したり出力予測を変えたりすることなく、入力シーケンスに対して完全に等価な線形システムにマッピングできることを示します。局所的または区分的線形性を示す画像拡散モデルの技術を拡張し、次のトークン予測に対する入力シーケンスに関する勾配計算を戦略的に変更することで、モデルのヤコビ行列が線形システムでほぼ正確に前方予測を再現するようにします。このアプローチを複数のモデル（Llama 3、Gemma 3、Qwen 3、Phi 4、Mistral Ministral、OLMo 2、Llama 3.3 70B Q4まで）で実証し、分離されたヤコビ行列の特異値分解を通じて、これらのLLMが極めて低次元の部分空間で動作し、最大の特異ベクトルの多くが最も可能性の高い出力トークンに関連する概念をデコードすることを示します。このアプローチにより、各連続する層（およびその注意機構とMLPコンポーネント）の動作をほぼ正確な線形システムとして検証し、意味概念の出現を観察することも可能です。表現力とグローバルな非線形性にもかかわらず、現代のLLMは、ほぼ正確な局所的線形分解を通じて解釈可能であり、内部表現に関する洞察を提供し、次のトークン予測プロセスにおける解釈可能な意味構造を明らかにします。

English

We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

大規模言語モデルは局所的に線形な写像である

Large Language Models are Locally Linear Mappings

要旨

Support