ReasonFlux: スケーリングされた思考テンプレートを介した階層的LLM推論

要旨

階層的なLLM推論によるスケーリング思考テンプレートを介した推論探索空間の効果的な最適化が可能であり、OpenAIのo1-previewやDeepSeek V3などの強力なLLMの数学的推論能力を上回ることを示します。ReasonFlux-32Bモデルをわずか8つのGPUで訓練し、3つの革新を導入します：(i) 約500個の高レベルな思考テンプレートを含む構造化された汎用思考テンプレートライブラリ、類似または関連する推論問題に一般化できる；(ii) 長いCoTsではなく一連の思考テンプレートに階層的な強化学習を行い、基本LLMを最適なテンプレート軌道を計画し、徐々に複雑な問題を処理するために最適化する；(iii) 推論時に思考テンプレートを適応的にスケーリングする新しい推論スケーリングシステム。連続した思考テンプレートを含むテンプレート軌道により、ReasonFlux-32Bは数学的推論能力を最先端のレベルに大幅に向上させます。特に、MATHベンチマークでは、91.2%の精度を達成し、o1-previewを6.7%上回ります。USA数学オリンピアード（AIME）ベンチマークでは、ReasonFlux-32Bは平均56.7%の問題を解決し、o1-previewやDeepSeek-V3をそれぞれ27%と45%上回ります。コード：https://github.com/Gen-Verse/ReasonFlux

English

We present that hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space and outperform the mathematical reasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3. We train our ReasonFlux-32B model with only 8 GPUs and introduces three innovations: (i) a structured and generic thought template library, containing around 500 high-level thought templates capable of generalizing to similar or relevant reasoning problems; (ii) performing hierarchical reinforcement learning on a sequence of thought templates instead of long CoTs, optimizing a base LLM to plan out an optimal template trajectory for gradually handling complex problems; (iii) a brand new inference scaling system that enables hierarchical LLM reasoning by adaptively scaling thought templates at inference time. With a template trajectory containing sequential thought templates, our ReasonFlux-32B significantly advances math reasoning capabilities to state-of-the-art levels. Notably, on the MATH benchmark, it achieves an accuracy of 91.2% and surpasses o1-preview by 6.7%. On the USA Math Olympiad (AIME) benchmark, ReasonFlux-32B solves an average of 56.7% of problems, surpassing o1-preview and DeepSeek-V3 by 27% and 45%, respectively. Code: https://github.com/Gen-Verse/ReasonFlux

ReasonFlux: スケーリングされた思考テンプレートを介した階層的LLM推論

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

要旨

Support