논리적 추론을 위한 알고리즘적 연역 회로 규명

초록

최근 연구들은 대규모 언어 모델(LLM)이 그래프 탐색 알고리즘과 몇 번의 학습(few-shot learning) 환경에서의 단계별 추론을 추상적으로 설명하는 기능적 상징적 표현을 통합함으로써 강력한 추론 성능을 달성할 수 있음을 보여주었다. 그러나 LLM이 소수의 시연 예시만으로 각 추론 단계의 추상적 의미와 전체 알고리즘을 어떻게 진정으로 이해하는지는 여전히 불명확하다. 본 연구는 개별 추론 단계를 담당하는 어텐션 헤드를 위치화하고, 이들 간에 전송되는 정보의 유형을 특성화하는 것을 목표로 한다. 먼저, 상징 기반 사고 연쇄(Chain-of-Thought, CoT) 프롬프팅 프레임워크 하에서 구성 추론 단계를 해당 토큰 로짓과 정렬한다. 분석 결과, 추론 과정을 유도하는 토큰 위치는 시연 예시에서 추론 행동 패턴을 만족시키기 위한 제약으로 인해 발생하는 낮은 신뢰도 점수와 연관되어 있음을 보여준다. 이후 인과 매개 분석 기법을 채택하여 이러한 패턴을 담당하는 어텐션 헤드를 식별한다. 또한, 연구 결과는 LLM이 전문적인 어텐션 헤드(전체 헤드의 약 3%)를 통해 개별 하위 추론 과제에 대한 사실 및 규칙 기반 정보를 검색하는 반면, 상위 계층은 주로 정보 통합과 여러 중간 추론 단계를 조정하여 전체 과제를 해결하는 전역적 추론 전략(예: 그래프 탐색 알고리즘)의 출현을 촉진함을 시사한다.

English

Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic representations that abstractly describe graph traversal algorithms and step-by-step reasoning in few-shot learning settings. However, it remains unclear how LLMs genuinely understand the abstract meaning of each reasoning step and the overall algorithm from only a limited number of demonstrations. This work aims to localize the attention heads responsible for individual reasoning steps and characterize the types of information transferred among them. We first align constituent reasoning steps with their corresponding token logits under a symbolic-aided Chain-of-Thought (CoT) prompting framework. Our analysis shows that token positions that steer the reasoning process are associated with low confidence scores caused by constraints on satisfying reasoning behavior patterns in demonstrations. We then adopt causal mediation analysis techniques to identify the attention heads responsible for these patterns. In addition, our findings indicate that LLMs retrieve factual and rule-based information for individual sub-reasoning tasks through specialized attention heads (approximately 3% total heads), whereas higher layers predominantly facilitate information integration and the emergence of global reasoning strategies (e.g., graph traversal algorithms) that coordinate multiple intermediate reasoning steps to solve the overall task.