思考のアンカー：LLMの推論ステップのうち、どの部分が重要なのか？

要旨

大規模言語モデルの推論能力は最近、多くの分野で最先端の性能を達成している。しかし、その長文の連鎖的思考（chain-of-thought）推論は、生成される各トークンがそれ以前のすべてのトークンに依存するため、解釈可能性に課題を生み出しており、計算を分解することが難しくなっている。我々は、文レベルで推論の軌跡を分析することが、推論プロセスを理解するための有望なアプローチであると主張する。我々は、3つの補完的な帰属手法を提示する：（1）ブラックボックス手法では、モデルが特定の文または異なる意味を持つ文を生成する条件で100回のロールアウトを行い、最終的な回答を比較することで、各文の反事実的重要性を測定する；（2）ホワイトボックス手法では、文のペア間のアテンションパターンを集約し、「受信者」アテンションヘッドを介してすべての将来の文から不均衡な注目を受ける「ブロードキャスト」文を特定する；（3）因果帰属手法では、ある文へのアテンションを抑制し、各将来の文のトークンへの影響を測定することで、文間の論理的接続を測定する。各手法は、思考のアンカー（thought anchors）の存在を示す証拠を提供する。思考のアンカーとは、不均衡な重要性を持ち、その後の推論プロセスに不釣り合いな影響を与える推論ステップであり、通常は計画やバックトラッキングの文である。我々は、これらの手法の出力を視覚化するためのオープンソースツール（www.thought-anchors.com）を提供し、モデルが多段階の推論を実行する方法をマッピングする手法間の収束パターンを示すケーススタディを提示する。手法間の一貫性は、推論モデルをより深く理解するための文レベルの分析の可能性を示している。

English

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method measuring each sentence's counterfactual importance by comparing final answers across 100 rollouts conditioned on the model generating that sentence or one with a different meaning; (2) a white-box method of aggregating attention patterns between pairs of sentences, which identified ``broadcasting'' sentences that receive disproportionate attention from all future sentences via ``receiver'' attention heads; (3) a causal attribution method measuring logical connections between sentences by suppressing attention toward one sentence and measuring the effect on each future sentence's tokens. Each method provides evidence for the existence of thought anchors, reasoning steps that have outsized importance and that disproportionately influence the subsequent reasoning process. These thought anchors are typically planning or backtracking sentences. We provide an open-source tool (www.thought-anchors.com) for visualizing the outputs of our methods, and present a case study showing converging patterns across methods that map how a model performs multi-step reasoning. The consistency across methods demonstrates the potential of sentence-level analysis for a deeper understanding of reasoning models.

思考のアンカー：LLMの推論ステップのうち、どの部分が重要なのか？

Thought Anchors: Which LLM Reasoning Steps Matter?

要旨

Support