大规模推理模型的强化学习研究综述

摘要

本文综述了强化学习（Reinforcement Learning, RL）在大型语言模型（Large Language Models, LLMs）推理领域的最新进展。RL在提升LLM能力边界方面取得了显著成就，尤其是在解决数学与编程等复杂逻辑任务上。因此，RL已成为将LLMs转化为逻辑推理模型（Logical Reasoning Models, LRMs）的基础方法论。随着该领域的快速发展，RL在LRMs上的进一步扩展不仅面临计算资源的基础性挑战，还涉及算法设计、训练数据及基础设施等方面。鉴于此，重新审视这一领域的发展历程，评估其发展轨迹，并探索增强RL向人工超级智能（Artificial SuperIntelligence, ASI）可扩展性的策略，显得尤为及时。特别是，我们考察了自DeepSeek-R1发布以来，将RL应用于LLMs和LRMs以提升推理能力的研究，包括基础组件、核心问题、训练资源及下游应用，旨在识别这一快速演进领域的未来机遇与方向。我们期望本综述能促进RL在更广泛推理模型上的未来研究。GitHub地址：https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

English

In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

大规模推理模型的强化学习研究综述

A Survey of Reinforcement Learning for Large Reasoning Models

摘要

Support