ProtoReasoning: 일반화 가능한 추론의 기초로서의 프로토타입 대규모 언어 모델(LLM)에서

초록

최근 장기 사고 연쇄(Long Chain-of-Thought, Long CoT) 추론으로 훈련된 대규모 추론 모델(Large Reasoning Models, LRMs)의 발전은 놀라운 도메인 간 일반화 능력을 보여주었다. 그러나 이러한 전이를 지원하는 근본적인 메커니즘은 여전히 잘 이해되지 않고 있다. 우리는 도메인 간 일반화가 공유된 추상적 추론 프로토타입(abstract reasoning prototypes)에서 비롯된다고 가정한다. 이 프로토타입은 문제의 본질을 포착하는 기본적인 추론 패턴으로, 표현의 미묘한 차이를 최소화하여 겉보기에는 다양한 작업들이 공유된 추론 구조에 기반하고 있음을 드러낸다. 이 가설에 기반하여, 우리는 확장 가능하고 검증 가능한 프로토타입 표현(Prolog 논리 추론, PDDL 계획)을 활용하여 대형 언어 모델(LLMs)의 추론 능력을 강화하는 ProtoReasoning 프레임워크를 제안한다. ProtoReasoning은 다음과 같은 특징을 갖는다: (1) 문제를 해당 프로토타입 표현으로 변환하는 자동화된 프로토타입 구성 파이프라인, (2) Prolog/PDDL 인터프리터를 통해 신뢰할 수 있는 피드백을 제공하는 포괄적인 검증 시스템, (3) 프로토타입 공간 내에서 정확성을 보장하면서 임의로 문제를 합성할 수 있는 확장성. 광범위한 실험 결과, ProtoReasoning은 논리 추론(Enigmata-Eval)에서 기준 모델 대비 4.7%, 계획 작업에서 6.3%, 일반 추론(MMLU)에서 4.0%, 수학(AIME24)에서 1.0%의 성능 향상을 달성했다. 특히, 우리의 제거 연구(ablation studies)는 프로토타입 공간에서의 학습이 자연어 표현만을 사용한 훈련에 비해 구조적으로 유사한 문제에 대한 일반화 능력이 향상됨을 확인하며, 추론 프로토타입이 대형 언어 모델의 일반화 가능한 추론의 기반이 된다는 우리의 가설을 검증했다.

English

Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.

ProtoReasoning: 일반화 가능한 추론의 기초로서의 프로토타입 대규모 언어 모델(LLM)에서

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

초록

Support