그러므로 나는 존재한다. 나는 생각한다.

초록

우리는 다음과 같은 질문을 고려한다: 대규모 언어 추론 모델이 선택을 할 때, 먼저 생각한 후 결정하는 것인가, 아니면 먼저 결정한 후 생각하는 것인가? 본 논문에서는 감지 가능하고 초기부터 인코딩된 결정이 추론 모델의 생각의 사슬(chain-of-thought)을 형성한다는 증거를 제시한다. 구체적으로, 우리는 간단한 선형 탐색기(linear probe)가 생성 전 활성화 상태에서 도구 호출(tool-calling) 결정을 매우 높은 신뢰도로, 어떤 경우에는 단일 추론 토큰이 생성되기 전에도 성공적으로 디코딩함을 보여준다. 활성화 조정(activation steering)은 이를 인과적으로 지지한다: 결정 방향을 교란하면 고려 과정이 과도하게 늘어나고, 많은 사례에서(모델과 벤치마크에 따라 7~79% 사이) 행동이 뒤바뀐다. 또한 행동 분석을 통해, 조정으로 인해 결정이 바뀔 때 생각의 사슬 과정이 이를 저항하기보다는 뒤바뀐 결정을 합리화하는 경우가 많음을 보여준다. 이러한 결과들은 종합적으로, 추론 모델이 텍스트로 숙고를 시작하기 전에 행동 선택을 인코딩할 수 있음을 시사한다.

English

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

그러므로 나는 존재한다. 나는 생각한다.

Therefore I am. I Think

초록

Support