ProtoReasoning:原型作为大语言模型中可泛化推理的基础
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
June 18, 2025
作者: Feng He, Zijun Chen, Xinnian Liang, Tingting Ma, Yunqi Qiu, Shuangzhi Wu, Junchi Yan
cs.AI
摘要
近期,采用长链思维推理(Long CoT)训练的大型推理模型(LRMs)展现了卓越的跨领域泛化能力。然而,支撑这种迁移的内在机制仍不甚明了。我们假设,跨领域泛化源于共享的抽象推理原型——这些基本推理模式捕捉了跨领域问题的本质。这些原型最小化了表示的细微差别,揭示了看似多样的任务实则植根于共享的推理结构。基于这一假设,我们提出了ProtoReasoning框架,通过利用可扩展且可验证的原型表示(如Prolog用于逻辑推理,PDDL用于规划),增强大语言模型(LLMs)的推理能力。ProtoReasoning具备以下特点:(1) 自动化原型构建流程,将问题转化为相应的原型表示;(2) 全面的验证系统,通过Prolog/PDDL解释器提供可靠反馈;(3) 在原型空间内任意合成问题并确保正确性的可扩展性。大量实验表明,ProtoReasoning在逻辑推理(Enigmata-Eval)上较基线模型提升4.7%,在规划任务上提升6.3%,在一般推理(MMLU)上提升4.0%,在数学(AIME24)上提升1.0%。尤为重要的是,我们的消融研究证实,与仅在自然语言表示上训练相比,在原型空间学习还能显著提升对结构相似问题的泛化能力,验证了我们的假设:推理原型是大型语言模型中可泛化推理的基础。
English
Recent advances in Large Reasoning Models (LRMs) trained with Long
Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain
generalization capabilities. However, the underlying mechanisms supporting such
transfer remain poorly understood. We hypothesize that cross-domain
generalization arises from shared abstract reasoning prototypes -- fundamental
reasoning patterns that capture the essence of problems across domains. These
prototypes minimize the nuances of the representation, revealing that seemingly
diverse tasks are grounded in shared reasoning structures.Based on this
hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning
ability of LLMs by leveraging scalable and verifiable prototypical
representations (Prolog for logical reasoning, PDDL for
planning).ProtoReasoning features: (1) an automated prototype construction
pipeline that transforms problems into corresponding prototype representations;
(2) a comprehensive verification system providing reliable feedback through
Prolog/PDDL interpreters; (3) the scalability to synthesize problems
arbitrarily within prototype space while ensuring correctness. Extensive
experiments show that ProtoReasoning achieves 4.7% improvement over baseline
models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning
tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics
(AIME24). Significantly, our ablation studies confirm that learning in
prototype space also demonstrates enhanced generalization to structurally
similar problems compared to training solely on natural language
representations, validating our hypothesis that reasoning prototypes serve as
the foundation for generalizable reasoning in large language models.