ChatPaper.aiChatPaper

朝向通用無模型強化學習

Towards General-Purpose Model-Free Reinforcement Learning

January 27, 2025
作者: Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat
cs.AI

摘要

強化學習(RL)承諾提供一個幾乎通用的問題解決框架。然而,在實踐中,RL算法通常會針對特定基準進行調整,依賴精心調整的超參數和算法選擇。最近,強大的基於模型的RL方法展現了令人印象深刻的通用結果,但代價是增加了複雜性和運行時間,限制了它們的廣泛應用。在本文中,我們試圖找到一種統一的無模型深度RL算法,可以應對各種領域和問題設置。為了實現這一目標,我們利用基於模型的表示,大致線性化價值函數,利用基於模型的RL使用的更密集的任務目標,同時避免與規劃或模擬軌跡相關的成本。我們對我們的算法 MR.Q 在各種常見的RL基準測試中進行評估,使用一組超參數,並展示了與特定領域和通用基準的競爭性表現,為構建通用無模型深度RL算法邁出了具體的一步。
English
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

Summary

AI-Generated Summary

PDF303January 28, 2025