大規模言語モデルの数え上げ能力とトークン化の影響

要旨

現代の大規模言語モデル（LLM）の中核であるトランスフォーマーは、推論能力を阻害する固有のアーキテクチャ上の制約に直面しています。再帰ネットワークとは異なり、トランスフォーマーには再帰的な接続が欠如しており、一定の深さの計算に制約があります。この制約により、トランスフォーマーは複雑度クラスTC^0に位置付けられ、入力長が増加するにつれて深い推論を要求するタスクを理論的に解決できない可能性があります。多くの推論タスクの基本的な要素であるカウントには、帰納的に実行するために推論の深さが線形に増加する必要があります。以前の研究では、トランスフォーマーをベースとする専門家モデル（つまり、カウントタスク用に特別に訓練されたモデル）のカウント能力の上限が確立されていますが、これらの結果は推論メカニズムの違いにより、汎用LLMに直接拡張されません。最近の研究では、Chain of Thought（CoT）推論が、カウントタスクにおけるトランスフォーマーのアーキテクチャ上の制約の一部を緩和するのに役立つことが示されています。ただし、これらのモデルにおけるトークン化の役割にはあまり注意が払われていません。専門家モデルがしばしば文字レベルのトークン化を使用するのに対し、LLMは通常、バイトレベル（BPE）トークナイザーに依存しており、これは推論が処理される方法を根本的に変えます。私たちの研究は、トークン化がLLMのカウント能力に与える影響を調査し、入力のトークン化の違いに基づく実質的な性能の変動を明らかにします。理論的および実験的な分析を提供し、トークン化の選択がモデルの理論的計算可能性を損なう方法について洞察を提供し、それによりLLMにおける推論を向上させるための新しいトークン化方法の設計をインスピレーションとして提供します。

English

Transformers, the backbone of modern large language models (LLMs), face inherent architectural limitations that impede their reasoning capabilities. Unlike recurrent networks, Transformers lack recurrent connections, confining them to constant-depth computation. This restriction places them in the complexity class TC^0, making them theoretically incapable of solving tasks that demand increasingly deep reasoning as input length grows. Counting, a fundamental component of many reasoning tasks, also requires reasoning depth to grow linearly to be performed inductively. While previous studies have established the upper limits of counting ability in Transformer-based expert models (i.e., models specifically trained for counting tasks), these findings do not directly extend to general-purpose LLMs due to differences in reasoning mechanisms. Recent work has highlighted how Chain of Thought (CoT) reasoning can help alleviate some of the architectural limitations of Transformers in counting tasks. However, little attention has been paid to the role of tokenization in these models. Unlike expert models that often use character-level tokenization, LLMs typically rely on byte-level (BPE) tokenizers, which fundamentally alters the way reasoning is processed. Our work investigates the impact of tokenization on the counting abilities of LLMs, uncovering substantial performance variations based on input tokenization differences. We provide both theoretical and experimental analyses, offering insights into how tokenization choices can undermine models' theoretical computability, thereby inspiring the design of new tokenization methods to enhance reasoning in LLMs.

大規模言語モデルの数え上げ能力とトークン化の影響

Counting Ability of Large Language Models and Impact of Tokenization

要旨

Support