視覚における強化学習：サーベイ

要旨

強化学習（RL）と視覚知能の交差点における最近の進展により、複雑な視覚シーンを認識するだけでなく、その中で推論、生成、行動するエージェントが実現されています。本調査では、この分野の最新かつ批判的な総合を提供します。まず、視覚RL問題を形式化し、RLHFから検証可能な報酬パラダイム、そしてProximal Policy OptimizationからGroup Relative Policy Optimizationまでの政策最適化戦略の進化を追跡します。次に、200以上の代表的な研究を、マルチモーダル大規模言語モデル、視覚生成、統一モデルフレームワーク、視覚-言語-行動モデルの4つのテーマ別の柱に整理します。各柱について、アルゴリズム設計、報酬設計、ベンチマークの進展を検討し、カリキュラム駆動型トレーニング、選好整合型拡散、統一報酬モデリングなどのトレンドを抽出します。最後に、セットレベルの忠実度、サンプルレベルの選好、ステートレベルの安定性にわたる評価プロトコルをレビューし、サンプル効率、汎化、安全な展開などの未解決の課題を特定します。私たちの目標は、研究者や実務者に視覚RLの急速に拡大する風景の一貫したマップを提供し、将来の研究の有望な方向性を強調することです。リソースは以下で利用可能です： https://github.com/weijiawu/Awesome-Visual-Reinforcement-Learning.

English

Recent advances at the intersection of reinforcement learning (RL) and visual intelligence have enabled agents that not only perceive complex visual scenes but also reason, generate, and act within them. This survey offers a critical and up-to-date synthesis of the field. We first formalize visual RL problems and trace the evolution of policy-optimization strategies from RLHF to verifiable reward paradigms, and from Proximal Policy Optimization to Group Relative Policy Optimization. We then organize more than 200 representative works into four thematic pillars: multi-modal large language models, visual generation, unified model frameworks, and vision-language-action models. For each pillar we examine algorithmic design, reward engineering, benchmark progress, and we distill trends such as curriculum-driven training, preference-aligned diffusion, and unified reward modeling. Finally, we review evaluation protocols spanning set-level fidelity, sample-level preference, and state-level stability, and we identify open challenges that include sample efficiency, generalization, and safe deployment. Our goal is to provide researchers and practitioners with a coherent map of the rapidly expanding landscape of visual RL and to highlight promising directions for future inquiry. Resources are available at: https://github.com/weijiawu/Awesome-Visual-Reinforcement-Learning.

視覚における強化学習：サーベイ

Reinforcement Learning in Vision: A Survey

要旨

Support