TD-MPC2: 연속 제어를 위한 확장 가능하고 강건한 세계 모델

초록

TD-MPC는 학습된 암묵적(디코더 없는) 세계 모델의 잠재 공간에서 지역 궤적 최적화를 수행하는 모델 기반 강화 학습(RL) 알고리즘입니다. 본 연구에서는 TD-MPC 알고리즘을 개선한 TD-MPC2를 소개합니다. 우리는 TD-MPC2가 4개의 다양한 작업 영역에 걸친 104개의 온라인 RL 작업에서 기준선을 크게 능가하며, 단일 하이퍼파라미터 세트로 일관되게 강력한 결과를 달성함을 보여줍니다. 또한, 에이전트의 능력이 모델과 데이터 크기에 따라 증가함을 보여주고, 317M 파라미터의 단일 에이전트를 여러 작업 영역, 구현체, 그리고 행동 공간에 걸쳐 80개의 작업을 수행하도록 성공적으로 학습시켰습니다. 마지막으로, 대규모 TD-MPC2 에이전트와 관련된 교훈, 기회, 그리고 위험에 대한 설명을 제공합니다. 비디오, 모델, 데이터, 코드 등을 https://nicklashansen.github.io/td-mpc2에서 확인하세요.

English

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2

TD-MPC2: 연속 제어를 위한 확장 가능하고 강건한 세계 모델

TD-MPC2: Scalable, Robust World Models for Continuous Control

초록

Support