LLM-Microscope: 트랜스포머의 문맥 기억에서 구두점의 숨은 역할 밝히기

초록

대규모 언어 모델(LLMs)이 문맥 정보를 어떻게 인코딩하고 저장하는지 정량화하는 방법을 소개합니다. 이 연구는 일반적으로 사소하게 여겨지는 토큰(예: 관사, 구두점 등)이 놀랍게도 높은 문맥 정보를 담고 있음을 보여줍니다. 특히, 이러한 토큰들(중지어, 관사, 쉼표 등)을 제거하면, 관련 없는 토큰만 제거하더라도 MMLU와 BABILong-4k에서 일관되게 성능 저하가 발생합니다. 또한, 우리의 분석은 문맥화와 선형성 사이에 강한 상관관계가 있음을 보여주는데, 여기서 선형성은 한 레이어의 임베딩에서 다음 레이어로의 변환이 단일 선형 매핑으로 얼마나 근사될 수 있는지를 측정합니다. 이러한 발견들은 문맥 유지에 있어서 필러 토큰의 숨겨진 중요성을 강조합니다. 더 깊이 있는 탐구를 위해, 우리는 LLM-Microscope라는 오픈소스 툴킷을 제시합니다. 이 툴킷은 토큰 수준의 비선형성을 평가하고, 문맥 메모리를 분석하며, 중간 레이어의 기여도를 시각화(수정된 Logit Lens를 통해)하고, 표현의 내재적 차원을 측정합니다. 이 툴킷은 겉보기에는 사소해 보이는 토큰들이 장거리 이해에 있어서 얼마나 중요한 역할을 하는지를 밝혀줍니다.

English

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.

LLM-Microscope: 트랜스포머의 문맥 기억에서 구두점의 숨은 역할 밝히기

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

초록

Support