原型推理：以原型作為大型語言模型中可泛化推理的基礎

摘要

近期，基於長鏈思維推理（Long Chain-of-Thought, Long CoT）訓練的大型推理模型（Large Reasoning Models, LRMs）展現了顯著的跨領域泛化能力。然而，支撐此類遷移的內在機制仍鮮為人知。我們假設，跨領域泛化源於共享的抽象推理原型——這些原型捕捉了跨領域問題的核心，並作為基本的推理模式。這些原型最小化了表徵的細微差異，揭示了看似多樣的任務實際上基於共享的推理結構。基於這一假設，我們提出了ProtoReasoning框架，該框架通過利用可擴展且可驗證的原型表徵（如Prolog用於邏輯推理，PDDL用於規劃）來增強大型語言模型（LLMs）的推理能力。ProtoReasoning框架具備以下特點：（1）一個自動化的原型構建管道，將問題轉化為相應的原型表徵；（2）一個全面的驗證系統，通過Prolog/PDDL解釋器提供可靠的反饋；（3）在原型空間內任意合成問題的同時確保正確性的可擴展性。大量實驗表明，ProtoReasoning在邏輯推理（Enigmata-Eval）上比基準模型提升了4.7%，在規劃任務上提升了6.3%，在一般推理（MMLU）上提升了4.0%，在數學（AIME24）上提升了1.0%。重要的是，我們的消融研究證實，與僅在自然語言表徵上訓練相比，在原型空間中學習也展現出對結構相似問題的增強泛化能力，這驗證了我們的假設，即推理原型是大型語言模型中可泛化推理的基礎。

English

Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.

原型推理：以原型作為大型語言模型中可泛化推理的基礎

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

摘要

Support