ProtoReasoning: 汎用的推論の基盤としてのプロトタイプをLLMに適用する

要旨

長い連鎖思考（Long Chain-of-Thought, Long CoT）推論を用いて訓練された大規模推論モデル（Large Reasoning Models, LRMs）の最近の進展は、顕著なクロスドメイン汎化能力を示している。しかし、そのような転移を支える根本的なメカニズムはまだ十分に理解されていない。我々は、クロスドメイン汎化が共有された抽象的な推論プロトタイプ――ドメインを超えた問題の本質を捉える基本的な推論パターン――から生じると仮説を立てる。これらのプロトタイプは表現のニュアンスを最小化し、一見多様なタスクが共有された推論構造に基づいていることを明らかにする。この仮説に基づき、我々はProtoReasoningを提案する。これは、スケーラブルで検証可能なプロトタイプ表現（論理推論のためのProlog、計画のためのPDDL）を活用してLLMの推論能力を強化するフレームワークである。ProtoReasoningの特徴は以下の通りである：（1）問題を対応するプロトタイプ表現に変換する自動化されたプロトタイプ構築パイプライン、（2）Prolog/PDDLインタプリタを通じて信頼性のあるフィードバックを提供する包括的な検証システム、（3）プロトタイプ空間内で任意に問題を合成しつつ正確性を保証するスケーラビリティ。大規模な実験により、ProtoReasoningは論理推論（Enigmata-Eval）においてベースラインモデルに対して4.7%、計画タスクにおいて6.3%、一般的な推論（MMLU）において4.0%、数学（AIME24）において1.0%の改善を達成することが示された。特に、アブレーション研究により、プロトタイプ空間での学習は、自然言語表現のみでの訓練と比較して、構造的に類似した問題に対する汎化能力が向上することを確認し、推論プロトタイプが大規模言語モデルにおける汎化可能な推論の基盤であるという我々の仮説を検証した。

English

Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.

ProtoReasoning: 汎用的推論の基盤としてのプロトタイプをLLMに適用する

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

要旨

Support