사고 앵커: 어떤 LLM 추론 단계가 중요한가?

초록

추론 능력을 갖춘 대형 언어 모델은 최근 다양한 분야에서 최첨단 성능을 달성했습니다. 그러나 이 모델들의 장문 체인-오브-사고(chain-of-thought) 추론은 해석 가능성에 있어 어려움을 야기합니다. 생성된 각 토큰이 이전의 모든 토큰에 의존하기 때문에 계산 과정을 분해하기가 더 어려워지기 때문입니다. 우리는 문장 수준에서 추론 흔적을 분석하는 것이 추론 과정을 이해하는 데 유망한 접근법이라고 주장합니다. 이를 위해 세 가지 상호 보완적인 귀속(attribution) 방법을 제시합니다: (1) 블랙박스 방식으로, 모델이 특정 문장을 생성하거나 다른 의미의 문장을 생성하도록 조건을 부여한 100개의 롤아웃(rollout)을 통해 최종 답변을 비교하여 각 문장의 반사실적 중요도를 측정하는 방법; (2) 화이트박스 방식으로, 문장 쌍 간의 어텐션 패턴을 집계하여, 모든 미래 문장으로부터 과도한 어텐션을 받는 "브로드캐스팅" 문장과 이를 수신하는 "리시버" 어텐션 헤드를 식별하는 방법; (3) 한 문장에 대한 어텐션을 억제하고 각 미래 문장의 토큰에 미치는 영향을 측정함으로써 문장 간의 논리적 연결을 평가하는 인과적 귀속 방법. 각 방법은 추론 과정에서 과도한 중요성을 가지며 이후 추론 과정에 불균형한 영향을 미치는 "사고 앵커(thought anchors)"의 존재를 입증합니다. 이러한 사고 앵커는 일반적으로 계획 수립이나 역추적 문장입니다. 우리는 이 방법들의 출력을 시각화하기 위한 오픈소스 도구(www.thought-anchors.com)를 제공하고, 모델이 다단계 추론을 수행하는 방식을 매핑하는 데 있어 여러 방법 간의 일관된 패턴을 보여주는 사례 연구를 제시합니다. 이러한 방법들 간의 일관성은 문장 수준 분석이 추론 모델을 더 깊이 이해하는 데 있어 잠재력을 가지고 있음을 보여줍니다.

English

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method measuring each sentence's counterfactual importance by comparing final answers across 100 rollouts conditioned on the model generating that sentence or one with a different meaning; (2) a white-box method of aggregating attention patterns between pairs of sentences, which identified ``broadcasting'' sentences that receive disproportionate attention from all future sentences via ``receiver'' attention heads; (3) a causal attribution method measuring logical connections between sentences by suppressing attention toward one sentence and measuring the effect on each future sentence's tokens. Each method provides evidence for the existence of thought anchors, reasoning steps that have outsized importance and that disproportionately influence the subsequent reasoning process. These thought anchors are typically planning or backtracking sentences. We provide an open-source tool (www.thought-anchors.com) for visualizing the outputs of our methods, and present a case study showing converging patterns across methods that map how a model performs multi-step reasoning. The consistency across methods demonstrates the potential of sentence-level analysis for a deeper understanding of reasoning models.

사고 앵커: 어떤 LLM 추론 단계가 중요한가?

Thought Anchors: Which LLM Reasoning Steps Matter?

초록

Support