汎用モデルフリー強化学習に向けて

要旨

強化学習（RL）は、ほぼ普遍的な問題解決の枠組みを約束しています。しかしながら、実際には、RLアルゴリズムはしばしば特定のベンチマークに合わせて調整され、注意深く調整されたハイパーパラメータやアルゴリズムの選択に依存しています。最近、強力なモデルベースのRL手法は、ベンチマーク全体で印象的な一般的な結果を示していますが、複雑さが増し、実行時間が遅くなるというコストがかかり、より広範な適用が制限されています。本論文では、多様なドメインや問題設定に対応できる統一されたモデルフリーの深層RLアルゴリズムを見つける試みを行います。これを達成するために、モデルベースの表現を活用し、価値関数をおおよそ線形化することで、モデルベースのRLで使用されるより密なタスク目標を活用し、計画やシミュレートされた軌道に関連するコストを回避します。我々は、アルゴリズムであるMR.Qを、一連のハイパーパラメータでさまざまな一般的なRLベンチマークで評価し、ドメイン固有および一般的なベースラインに対して競争力のあるパフォーマンスを示し、汎用的なモデルフリーの深層RLアルゴリズム構築に向けた具体的な一歩を提供します。

English

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

汎用モデルフリー強化学習に向けて

Towards General-Purpose Model-Free Reinforcement Learning

要旨

Support