LLMのためのエージェンシック強化学習の展望：サーベイ

要旨

エージェント的強化学習（Agentic RL）の出現は、大規模言語モデル（LLM）に適用される従来の強化学習（LLM RL）からのパラダイムシフトを意味し、LLMを受動的なシーケンス生成器から、複雑で動的な世界に埋め込まれた自律的な意思決定エージェントへと再定義します。本サーベイでは、LLM-RLの単一ステップのマルコフ決定過程（MDP）と、Agentic RLを定義する時間的に拡張された部分観測マルコフ決定過程（POMDP）を対比することで、この概念的シフトを形式化します。この基盤に基づき、我々は包括的な二重分類法を提案します。一つは、計画、ツール使用、記憶、推論、自己改善、知覚といった中核的なエージェント能力を中心に構成され、もう一つはそれらの能力が多様なタスク領域にわたってどのように応用されるかを整理します。我々の主張の核心は、強化学習がこれらの能力を静的でヒューリスティックなモジュールから、適応的でロバストなエージェント的行動へと変換するための重要なメカニズムとして機能するという点です。今後の研究を支援し加速するため、オープンソースの環境、ベンチマーク、フレームワークを実用的な概要に統合します。500以上の最新研究を統合することで、本サーベイはこの急速に進化する分野の輪郭を描き、スケーラブルで汎用的なAIエージェントの開発を形作る機会と課題を強調します。

English

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

LLMのためのエージェンシック強化学習の展望：サーベイ

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

要旨

Support