大規模言語モデルにおけるトークン化制約：記号的および算術的推論の限界に関する研究

要旨

トークン化は、言語モデルにおける最初の、そしてしばしば過小評価されがちな計算層である。Chain-of-Thought（CoT）プロンプティングが、中間ステップを外部化することでトランスフォーマーモデルに再帰的計算を近似させる一方で、我々はそのような推論の成功がトークン化された入力の構造によって根本的に制限されることを示す。本論文では、特にバイトペアエンコーディング（BPE）のようなサブワードベースの手法が、原子推論単位を統合または曖昧にすることで、記号的計算を妨げる仕組みについて理論的かつ実証的な調査を行う。我々は、トークンの粒度が不適切であることが論理的整合性を乱し、モデルが記号的手続きを一般化するのを妨げることを形式化するために、トークン認識（Token Awareness）という概念を導入する。算術および記号タスクにおける体系的な評価を通じて、トークン構造が推論性能に劇的な影響を与え、CoTを用いても失敗を引き起こす一方で、原子的に整合したフォーマットが強力な一般化を可能にし、小さなモデル（例：GPT-4o-mini）がより大きなシステム（例：o1）を構造化推論において凌駕することを実証する。我々の研究結果は、LLMにおける記号的推論能力が純粋にアーキテクチャに依存するのではなく、トークンレベルの表現に深く条件付けられていることを明らかにする。

English

Tokenization is the first - and often underappreciated - layer of computation in language models. While Chain-of-Thought (CoT) prompting enables transformer models to approximate recurrent computation by externalizing intermediate steps, we show that the success of such reasoning is fundamentally bounded by the structure of tokenized inputs. This work presents a theoretical and empirical investigation into how tokenization schemes, particularly subword-based methods like byte-pair encoding (BPE), impede symbolic computation by merging or obscuring atomic reasoning units. We introduce the notion of Token Awareness to formalize how poor token granularity disrupts logical alignment and prevents models from generalizing symbolic procedures. Through systematic evaluation on arithmetic and symbolic tasks, we demonstrate that token structure dramatically affect reasoning performance, causing failure even with CoT, while atomically-aligned formats unlock strong generalization, allowing small models (e.g., GPT-4o-mini) to outperform larger systems (e.g., o1) in structured reasoning. Our findings reveal that symbolic reasoning ability in LLMs is not purely architectural, but deeply conditioned on token-level representations.

大規模言語モデルにおけるトークン化制約：記号的および算術的推論の限界に関する研究

Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits

要旨

Support