大規模推論モデルにおける強化学習の調査

要旨

本論文では、大規模言語モデル（LLMs）を用いた推論における強化学習（Reinforcement Learning, RL）の最近の進展を概観する。RLは、特に数学やコーディングなどの複雑な論理的タスクに対処する際に、LLMの能力のフロンティアを押し広げる上で顕著な成功を収めてきた。その結果、RLはLLMをLRM（Large Reasoning Models）に変換するための基盤的手法として確立された。この分野の急速な進展に伴い、LRMにおけるRLのさらなるスケーリングは、計算資源だけでなく、アルゴリズム設計、トレーニングデータ、インフラストラクチャにおいても基礎的な課題に直面している。このため、この領域の発展を再検討し、その軌道を再評価し、人工超知能（Artificial SuperIntelligence, ASI）に向けたRLのスケーラビリティを向上させるための戦略を探ることが時宜に適っている。特に、DeepSeek-R1のリリース以降、LLMおよびLRMの推論能力にRLを適用した研究を、基礎的構成要素、核心的問題、トレーニング資源、下流アプリケーションを含めて検証し、この急速に進化する領域における将来の機会と方向性を特定する。本レビューが、より広範な推論モデルにおけるRLの今後の研究を促進することを期待する。Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

English

In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

大規模推論モデルにおける強化学習の調査

A Survey of Reinforcement Learning for Large Reasoning Models

要旨

Support