ChatPaper.aiChatPaper

ProtoReasoning:原型作为大语言模型中可泛化推理的基础

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

June 18, 2025
作者: Feng He, Zijun Chen, Xinnian Liang, Tingting Ma, Yunqi Qiu, Shuangzhi Wu, Junchi Yan
cs.AI

摘要

近期,采用长链思维推理(Long CoT)训练的大型推理模型(LRMs)展现了卓越的跨领域泛化能力。然而,支撑这种迁移的内在机制仍不甚明了。我们假设,跨领域泛化源于共享的抽象推理原型——这些基本推理模式捕捉了跨领域问题的本质。这些原型最小化了表示的细微差别,揭示了看似多样的任务实则植根于共享的推理结构。基于这一假设,我们提出了ProtoReasoning框架,通过利用可扩展且可验证的原型表示(如Prolog用于逻辑推理,PDDL用于规划),增强大语言模型(LLMs)的推理能力。ProtoReasoning具备以下特点:(1) 自动化原型构建流程,将问题转化为相应的原型表示;(2) 全面的验证系统,通过Prolog/PDDL解释器提供可靠反馈;(3) 在原型空间内任意合成问题并确保正确性的可扩展性。大量实验表明,ProtoReasoning在逻辑推理(Enigmata-Eval)上较基线模型提升4.7%,在规划任务上提升6.3%,在一般推理(MMLU)上提升4.0%,在数学(AIME24)上提升1.0%。尤为重要的是,我们的消融研究证实,与仅在自然语言表示上训练相比,在原型空间学习还能显著提升对结构相似问题的泛化能力,验证了我们的假设:推理原型是大型语言模型中可泛化推理的基础。
English
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.
PDF283June 19, 2025