ChatPaper.aiChatPaper

大規模推理模型中的強化學習研究綜述

A Survey of Reinforcement Learning for Large Reasoning Models

September 10, 2025
作者: Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma, Xuekai Zhu, Ermo Hua, Yihao Liu, Zonglin Li, Huayu Chen, Xiaoye Qu, Yafu Li, Weize Chen, Zhenzhao Yuan, Junqi Gao, Dong Li, Zhiyuan Ma, Ganqu Cui, Zhiyuan Liu, Biqing Qi, Ning Ding, Bowen Zhou
cs.AI

摘要

本文綜述了強化學習(Reinforcement Learning, RL)在大型語言模型(Large Language Models, LLMs)推理應用中的最新進展。RL在拓展LLM能力邊界方面取得了顯著成就,尤其是在處理數學與編碼等複雜邏輯任務上。因此,RL已成為將LLMs轉化為邏輯推理模型(Logical Reasoning Models, LRMs)的基礎方法論。隨著該領域的快速發展,RL在LRMs上的進一步擴展正面臨基礎性挑戰,這些挑戰不僅存在於計算資源方面,還涉及算法設計、訓練數據及基礎設施等層面。有鑑於此,重新審視這一領域的發展歷程、評估其發展軌跡,並探索提升RL向人工超級智能(Artificial SuperIntelligence, ASI)可擴展性的策略,正當其時。特別是,我們考察了自DeepSeek-R1發布以來,將RL應用於LLMs和LRMs以提升推理能力的研究,包括基礎組件、核心問題、訓練資源及下游應用,旨在識別這一快速演進領域的未來機遇與方向。我們期望本綜述能促進RL在更廣泛推理模型上的未來研究。Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs
English
In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs
PDF1835September 11, 2025