LLM-Microscope: トランスフォーマーの文脈記憶における句読点の隠れた役割の解明

要旨

大規模言語モデル（LLM）が文脈情報をどのようにエンコードし保存するかを定量化する手法を紹介します。これにより、一見些細に見られるトークン（例えば、限定詞や句読点）が驚くほど高い文脈情報を保持していることが明らかになりました。特に、これらのトークン（特にストップワード、冠詞、カンマ）を削除すると、MMLUやBABILong-4kのパフォーマンスが一貫して低下します。これは、無関係なトークンのみを削除した場合でも同様です。また、分析からは、文脈化と線形性の間に強い相関があることが示されています。ここで線形性とは、ある層の埋め込みから次の層への変換が単一の線形写像でどれだけ近似できるかを測定するものです。これらの発見は、文脈を維持する上でのフィラートークンの隠れた重要性を強調しています。さらに探求するために、LLM-Microscopeというオープンソースのツールキットを提供します。このツールキットは、トークンレベルの非線形性を評価し、文脈メモリを測定し、中間層の寄与を可視化し（適応型Logit Lensを通じて）、表現の内在的次元を測定します。このツールキットは、一見些細なトークンが長距離理解において重要であることを明らかにします。

English

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.

LLM-Microscope: トランスフォーマーの文脈記憶における句読点の隠れた役割の解明

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

要旨

Support