LLM顯微鏡：揭示標點符號在Transformer上下文記憶中的隱藏作用

摘要

我們提出了一系列方法來量化大型語言模型（LLMs）如何編碼和存儲上下文信息，揭示了通常被視為次要的詞元（例如限定詞、標點符號）竟然承載著出乎意料的高上下文信息。值得注意的是，移除這些詞元——尤其是停用詞、冠詞和逗號——會持續降低模型在MMLU和BABILong-4k上的表現，即使僅移除不相關的詞元也是如此。我們的分析還顯示，上下文化與線性度之間存在強烈關聯，其中線性度衡量了從一層嵌入到下一層嵌入的轉換能被單一線性映射近似到何種程度。這些發現凸顯了填充詞元在維持上下文中的隱含重要性。為了進一步探索，我們推出了LLM-Microscope，這是一個開源工具包，用於評估詞元級別的非線性度、評估上下文記憶、可視化中間層的貢獻（通過改進的Logit Lens），以及測量表徵的內在維度。該工具包揭示了看似微不足道的詞元如何對長距離理解至關重要。

English

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.

LLM顯微鏡：揭示標點符號在Transformer上下文記憶中的隱藏作用

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

摘要

Support