혼합 사고를 통한 논리적 추론 학습

초록

인간은 학습과 논리적 문제 해결을 위해 자연스럽게 여러 추론 양식을 활용한다. 즉, 자연어, 코드, 기호 논리와 같은 다양한 표현 형식을 사용한다. 반면, 대부분의 기존 LLM(대형 언어 모델) 기반 접근법은 훈련 과정에서 단일 추론 양식, 일반적으로 자연어만을 사용한다. 일부 방법은 추론 시점에서 양식 선택 또는 확장을 탐구했지만, 훈련 과정은 여전히 양식에 무관하여 양식 간의 시너지를 제한한다. 이러한 격차를 메우기 위해, 우리는 Mixture-of-Thought(MoT) 프레임워크를 제안한다. 이 프레임워크는 LLM이 세 가지 상호 보완적인 양식, 즉 자연어, 코드, 그리고 새로 도입된 기호 양식인 진리표를 통해 추론할 수 있도록 한다. 진리표는 논리적 사례를 체계적으로 열거하고 자연어 추론에서의 주요 실패 모드를 부분적으로 완화한다. MoT는 두 단계 설계를 채택한다: (1) 자기 진화 MoT 훈련, 이는 양식 간 필터링된 자기 생성 근거로부터 공동 학습한다; (2) MoT 추론, 이는 세 가지 양식의 시너지를 최대한 활용하여 더 나은 예측을 생성한다. FOLIO와 ProofWriter를 포함한 논리적 추론 벤치마크에서의 실험은 우리의 MoT 프레임워크가 단일 양식 사고 사슬 접근법을 사용한 강력한 LLM 기준선을 일관되게 그리고 상당히 능가하며, 최대 +11.7pp 평균 정확도 향상을 달성함을 보여준다. 추가 분석은 우리의 MoT 프레임워크가 훈련과 추론 단계 모두에 이점을 제공하며, 특히 더 어려운 논리적 추론 문제에서 효과적이고, 서로 다른 양식이 상호 보완적인 강점을 제공하며, 진리표 추론이 자연어 추론에서의 주요 병목 현상을 극복하는 데 도움을 준다는 것을 보여준다.

English

Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

혼합 사고를 통한 논리적 추론 학습

Learning to Reason via Mixture-of-Thought for Logical Reasoning

초록

Support