FastTD3：面向人形机器人控制的简洁、高效且强大的强化学习算法

摘要

強化學習（RL）在機器人領域推動了顯著進展，但其複雜性和漫長的訓練時間仍是主要瓶頸。在本報告中，我們介紹了FastTD3，這是一種簡單、快速且高效的RL算法，能顯著加速在HumanoidBench、IsaacLab和MuJoCo Playground等流行套件中的人形機器人訓練。我們的方案極為簡潔：我們訓練了一個離策略的TD3代理，並進行了多項改進——並行模擬、大批量更新、分佈式評論家以及精心調校的超參數。FastTD3在單塊A100 GPU上不到3小時內解決了一系列HumanoidBench任務，同時在訓練過程中保持穩定。我們還提供了一個輕量級且易於使用的FastTD3實現，以加速機器人領域的RL研究。

English

Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.