論理的推論のためのアルゴリズム的演繹回路の解明

要旨

近年の研究では、大規模言語モデル（LLM）が、グラフ探索アルゴリズムと段階的推論を抽象的に記述する機能的記号表現を少数ショット学習設定に組み込むことで、強力な推論性能を達成できることが示されている。しかしながら、LLMが限られた数のデモンストレーションのみから各推論ステップおよびアルゴリズム全体の抽象的な意味をどのように真に理解しているのかは、依然として不明である。本研究では、個々の推論ステップに責任を持つアテンションヘッドを特定し、それらの間で転送される情報の種類を特徴付けることを目的とする。まず、記号支援型Chain-of-Thought（CoT）プロンプティングフレームワークの下で、構成する推論ステップを対応するトークンロジットと対応付ける。解析の結果、推論プロセスを導くトークン位置は、デモンストレーションにおける推論行動パターンを満たすための制約によって生じる低い信頼度スコアと関連していることが示される。次に、因果媒介分析手法を採用して、これらのパターンに責任を持つアテンションヘッドを特定する。さらに、我々の知見は、LLMが専門化されたアテンションヘッド（全ヘッドの約3%）を通じて個々のサブ推論タスクのための事実ベースおよびルールベースの情報を取得する一方、高層層は主に情報統合と、複数の中間推論ステップを調整して全体タスクを解決するグローバルな推論戦略（例：グラフ探索アルゴリズム）の創発を促進することを示している。

English

Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic representations that abstractly describe graph traversal algorithms and step-by-step reasoning in few-shot learning settings. However, it remains unclear how LLMs genuinely understand the abstract meaning of each reasoning step and the overall algorithm from only a limited number of demonstrations. This work aims to localize the attention heads responsible for individual reasoning steps and characterize the types of information transferred among them. We first align constituent reasoning steps with their corresponding token logits under a symbolic-aided Chain-of-Thought (CoT) prompting framework. Our analysis shows that token positions that steer the reasoning process are associated with low confidence scores caused by constraints on satisfying reasoning behavior patterns in demonstrations. We then adopt causal mediation analysis techniques to identify the attention heads responsible for these patterns. In addition, our findings indicate that LLMs retrieve factual and rule-based information for individual sub-reasoning tasks through specialized attention heads (approximately 3% total heads), whereas higher layers predominantly facilitate information integration and the emergence of global reasoning strategies (e.g., graph traversal algorithms) that coordinate multiple intermediate reasoning steps to solve the overall task.