我思故我在。

摘要

我们探讨这样一个问题：当大型语言推理模型做出选择时，究竟是先思考后决策，还是先决策后思考？本文通过实验证据表明，可检测的早期编码决策会塑造推理模型中的思维链。具体而言，我们发现简单线性探针能以极高置信度从生成前激活状态中解码工具调用决策，某些情况下甚至在首个推理标记产生前即可实现。激活导向实验从因果层面验证了这一现象：干扰决策方向会导致审议过程膨胀，并在大量样本中引发行为反转（反转率介于7%-79%，具体取决于模型与基准测试）。行为分析进一步表明，当导向改变决策时，思维链过程往往会对反转进行合理化解释而非抵抗。这些结果共同表明，推理模型在开始文本层面的审议之前，可能已对行为选择进行了编码。

English

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

我思故我在。

Therefore I am. I Think

摘要

Support