TD-MPC2:用於連續控制的可擴展、強健世界模型
TD-MPC2: Scalable, Robust World Models for Continuous Control
October 25, 2023
作者: Nicklas Hansen, Hao Su, Xiaolong Wang
cs.AI
摘要
TD-MPC 是一種基於模型的強化學習(RL)算法,它在學習的隱式(無解碼器)世界模型的潛在空間中執行局部軌跡優化。在這項工作中,我們提出了 TD-MPC2:對 TD-MPC 算法的一系列改進。我們展示了 TD-MPC2 在 104 個在線 RL 任務中相對於基準線的顯著改進,這些任務涵蓋了 4 個不同的任務領域,並且在單組超參數下持續取得穩健的結果。我們進一步展示了隨著模型和數據規模的增加,代理的能力也在提高,並成功訓練了一個單一的 3.17 億參數代理來執行跨多個任務領域、具體表現和動作空間的 80 項任務。最後,我們總結了與大型 TD-MPC2 代理相關的教訓、機遇和風險。在 https://nicklashansen.github.io/td-mpc2 探索視頻、模型、數據、代碼等更多內容。
English
TD-MPC is a model-based reinforcement learning (RL) algorithm that performs
local trajectory optimization in the latent space of a learned implicit
(decoder-free) world model. In this work, we present TD-MPC2: a series of
improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves
significantly over baselines across 104 online RL tasks spanning 4 diverse task
domains, achieving consistently strong results with a single set of
hyperparameters. We further show that agent capabilities increase with model
and data size, and successfully train a single 317M parameter agent to perform
80 tasks across multiple task domains, embodiments, and action spaces. We
conclude with an account of lessons, opportunities, and risks associated with
large TD-MPC2 agents. Explore videos, models, data, code, and more at
https://nicklashansen.github.io/td-mpc2