ChatPaper.aiChatPaper

可擴展的思維鏈接:基於彈性推理

Scalable Chain of Thoughts via Elastic Reasoning

May 8, 2025
作者: Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong
cs.AI

摘要

大型推理模型(LRMs)在處理複雜任務方面取得了顯著進展,這主要得益於其生成的擴展思維鏈(CoT)。然而,這些模型不受控制的輸出長度在實際部署中帶來了重大挑戰,尤其是在推理時對令牌數量、延遲或計算資源有嚴格限制的場景下。我們提出了彈性推理(Elastic Reasoning),這是一種可擴展思維鏈的新框架,它明確將推理過程分為兩個階段——思考階段和解決方案階段——並為每個階段獨立分配預算。在測試時,彈性推理優先保證解決方案片段的完整性,從而顯著提高了在嚴格資源限制下的可靠性。為了訓練出能夠適應思考過程被截斷的模型,我們引入了一種輕量級的預算約束滾動策略,該策略整合到GRPO中,教導模型在思考過程被中斷時進行自適應推理,並能有效泛化到未見過的預算限制,無需額外訓練。在數學(AIME、MATH500)和編程(LiveCodeBench、Codeforces)基準測試上的實驗結果表明,彈性推理在嚴格預算限制下表現穩健,同時相比基準方法顯著降低了訓練成本。值得注意的是,即使在無約束的設置下,我們的方法也能產生更簡潔高效的推理。彈性推理為大規模可控推理這一迫切挑戰提供了一個原則性且實用的解決方案。
English
Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that explicitly separates reasoning into two phases--thinking and solution--with independently allocated budgets. At test time, Elastic Reasoning prioritize that completeness of solution segments, significantly improving reliability under tight resource constraints. To train models that are robust to truncated thinking, we introduce a lightweight budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes effectively to unseen budget constraints without additional training. Empirical results on mathematical (AIME, MATH500) and programming (LiveCodeBench, Codeforces) benchmarks demonstrate that Elastic Reasoning performs robustly under strict budget constraints, while incurring significantly lower training cost than baseline methods. Remarkably, our approach also produces more concise and efficient reasoning even in unconstrained settings. Elastic Reasoning offers a principled and practical solution to the pressing challenge of controllable reasoning at scale.

Summary

AI-Generated Summary

PDF161May 9, 2025