대형 언어 모델은 지역적으로 선형 매핑을 수행한다.

초록

여러 개방형 가중치 대형 언어 모델(LLM)의 추론 연산이 모델 가중치를 수정하거나 출력 예측을 변경하지 않고도 입력 시퀀스에 대해 정확히 동등한 선형 시스템으로 매핑될 수 있음을 보여준다. 이미지 확산 모델에서 나타나는 국소적 또는 조각별 선형성을 확장하여, 다음 토큰 예측을 위한 주어진 입력 시퀀스에 대한 그래디언트 계산을 전략적으로 변경함으로써 모델의 야코비안이 선형 시스템으로 거의 정확하게 순방향 예측을 재현하도록 한다. 이 접근법을 여러 모델(Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral 및 OLMo 2, 최대 Llama 3.3 70B Q4)에 걸쳐 시연하고, 분리된 야코비안의 특이값 분해를 통해 이러한 LLM이 매우 낮은 차원의 부분 공간에서 작동하며, 가장 큰 특이 벡터 중 다수가 가장 가능성 높은 출력 토큰과 관련된 개념으로 디코딩됨을 보여준다. 이 접근법은 또한 각 연속 레이어(그리고 그 어텐션 및 MLP 구성 요소)의 작동을 거의 정확한 선형 시스템으로 검토하고 의미론적 개념의 출현을 관찰할 수 있게 한다. 표현력과 전역 비선형성에도 불구하고, 현대 LLM은 거의 정확한 국소 선형 분해를 통해 해석될 수 있으며, 이는 내부 표현에 대한 통찰을 제공하고 다음 토큰 예측 과정에서 해석 가능한 의미 구조를 드러낸다.

English

We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

대형 언어 모델은 지역적으로 선형 매핑을 수행한다.

Large Language Models are Locally Linear Mappings

초록

Support