ReasonFlux:通過擴展思維模板進行層次化LLM推理
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
February 10, 2025
作者: Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang
cs.AI
摘要
我們提出,通過擴展思維模板的層次式LLM推理,能夠有效優化推理搜索空間,並且勝過OpenAI o1-preview和DeepSeek V3等強大LLM的數學推理能力。我們使用僅8個GPU訓練了我們的ReasonFlux-32B模型,並引入了三項創新:(i) 一個結構化且通用的思維模板庫,包含約500個高層次思維模板,能夠泛化到類似或相關的推理問題;(ii) 在一系列思維模板上執行層次式強化學習,而不是長CoTs,優化基礎LLM以規劃出處理逐漸複雜問題的最佳模板軌跡;(iii) 一個全新的推理擴展系統,能夠在推理時自適應地擴展思維模板,實現層次式LLM推理。通過包含連續思維模板的模板軌跡,我們的ReasonFlux-32B顯著提升了數學推理能力至最先進水平。值得注意的是,在MATH基準測試中,它達到了91.2%的準確率,比o1-preview高出6.7%。在美國數學奧林匹克(AIME)基準測試中,ReasonFlux-32B解決了平均56.7%的問題,分別超過o1-preview和DeepSeek-V3的27%和45%。代碼:https://github.com/Gen-Verse/ReasonFlux
English
We present that hierarchical LLM reasoning via scaling thought templates can
effectively optimize the reasoning search space and outperform the mathematical
reasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3.
We train our ReasonFlux-32B model with only 8 GPUs and introduces three
innovations: (i) a structured and generic thought template library, containing
around 500 high-level thought templates capable of generalizing to similar or
relevant reasoning problems; (ii) performing hierarchical reinforcement
learning on a sequence of thought templates instead of long CoTs, optimizing a
base LLM to plan out an optimal template trajectory for gradually handling
complex problems; (iii) a brand new inference scaling system that enables
hierarchical LLM reasoning by adaptively scaling thought templates at inference
time. With a template trajectory containing sequential thought templates, our
ReasonFlux-32B significantly advances math reasoning capabilities to
state-of-the-art levels. Notably, on the MATH benchmark, it achieves an
accuracy of 91.2% and surpasses o1-preview by 6.7%. On the USA Math Olympiad
(AIME) benchmark, ReasonFlux-32B solves an average of 56.7% of problems,
surpassing o1-preview and DeepSeek-V3 by 27% and 45%, respectively. Code:
https://github.com/Gen-Verse/ReasonFluxSummary
AI-Generated Summary