我思う、故に我在り。

要旨

大規模言語推論モデルが選択を行う際、それは「まず思考し、その後で決定する」のか、それとも「まず決定し、その後で思考する」のか。本論文では、検出可能な早期エンコードされた決定が推論モデルの思考連鎖（chain-of-thought）を形成しているという証拠を示す。具体的には、単純な線形プローブが、生成前の活性化状態から非常に高い確信度でツール呼び出しの決定をデコード可能であり、場合によっては最初の推論トークンが生成される前ですら可能であることを示す。活性化操作による検証はこれを因果的に支持する：決定方向を撹乱すると審議が過剰に膨らみ、多くの事例で行動が反転する（モデルとベンチマークに依存し7～79%）。行動分析を通じて、操作によって決定が変化する場合、思考連鎖プロセスは多くの場合、その反転に抵抗するのではなく、それを正当化するように働くことも示す。これらの結果は総じて、推論モデルがテキスト上の審議を開始する前に、行動選択をエンコードし得ることを示唆している。

English

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

我思う、故に我在り。

Therefore I am. I Think

要旨

Support