Modelos de Linguagem de Grande Escala são Mapeamentos Linearmente Locais

Resumo

Demonstramos que as operações de inferência de vários modelos de linguagem de grande escala (LLMs) de pesos abertos podem ser mapeadas para um sistema linear exatamente equivalente para uma sequência de entrada, sem modificar os pesos do modelo ou alterar as previsões de saída. Estendendo técnicas de modelos de difusão de imagens que exibem linearidade local ou por partes, alteramos estrategicamente o cálculo do gradiente em relação a uma sequência de entrada dada para uma previsão do próximo token, de modo que o Jacobiano do modelo reproduz quase exatamente a previsão direta com um sistema linear. Demonstramos essa abordagem em vários modelos (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral e OLMo 2, até Llama 3.3 70B Q4) e mostramos, por meio da decomposição em valores singulares do Jacobiano desacoplado, que esses LLMs operam em subespaços de dimensão extremamente baixa, onde muitos dos maiores vetores singulares decodificam conceitos relacionados ao token de saída mais provável. Essa abordagem também nos permite examinar a operação de cada camada sucessiva (e seus componentes de atenção e MLP) como sistemas lineares quase exatos e observar a emergência de conceitos semânticos. Apesar de seu poder expressivo e não linearidade global, os LLMs modernos podem ser interpretados por meio de decomposições localmente lineares quase exatas, que fornecem insights sobre suas representações internas e revelam estruturas semânticas interpretáveis no processo de previsão do próximo token.

English

We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

Modelos de Linguagem de Grande Escala são Mapeamentos Linearmente Locais

Large Language Models are Locally Linear Mappings

Resumo

Support