TD-MPC2:用于连续控制的可扩展、稳健的世界模型
TD-MPC2: Scalable, Robust World Models for Continuous Control
October 25, 2023
作者: Nicklas Hansen, Hao Su, Xiaolong Wang
cs.AI
摘要
TD-MPC是一种基于模型的强化学习(RL)算法,它在学习的隐式(无解码器)世界模型的潜在空间中执行局部轨迹优化。在这项工作中,我们提出了TD-MPC2:对TD-MPC算法的一系列改进。我们展示了TD-MPC2在跨越4个不同任务领域的104个在线RL任务中明显优于基准结果,使用单一一组超参数始终取得强大的结果。我们进一步展示了随着模型和数据规模的增加,代理能力也在增强,并成功训练了一个拥有3.17亿参数的单一代理,可以执行跨多个任务领域、具象化形式和动作空间的80个任务。最后,我们总结了与大型TD-MPC2代理相关的经验教训、机会和风险。在https://nicklashansen.github.io/td-mpc2 上探索视频、模型、数据、代码等内容。
English
TD-MPC is a model-based reinforcement learning (RL) algorithm that performs
local trajectory optimization in the latent space of a learned implicit
(decoder-free) world model. In this work, we present TD-MPC2: a series of
improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves
significantly over baselines across 104 online RL tasks spanning 4 diverse task
domains, achieving consistently strong results with a single set of
hyperparameters. We further show that agent capabilities increase with model
and data size, and successfully train a single 317M parameter agent to perform
80 tasks across multiple task domains, embodiments, and action spaces. We
conclude with an account of lessons, opportunities, and risks associated with
large TD-MPC2 agents. Explore videos, models, data, code, and more at
https://nicklashansen.github.io/td-mpc2