FastTD3: Apprendimento per Rinforzo Semplice, Veloce ed Efficace per il Controllo di Umanoidi

Abstract

L'apprendimento per rinforzo (Reinforcement Learning, RL) ha guidato progressi significativi nel campo della robotica, ma la sua complessità e i lunghi tempi di addestramento rimangono ostacoli principali. In questo rapporto, presentiamo FastTD3, un algoritmo RL semplice, veloce e performante che accelera notevolmente l'addestramento per robot umanoidi in suite popolari come HumanoidBench, IsaacLab e MuJoCo Playground. La nostra ricetta è sorprendentemente semplice: addestriamo un agente TD3 off-policy con diverse modifiche — simulazione parallela, aggiornamenti con batch di grandi dimensioni, un critico distribuzionale e iperparametri attentamente ottimizzati. FastTD3 risolve una serie di task di HumanoidBench in meno di 3 ore su una singola GPU A100, mantenendo la stabilità durante l'addestramento. Forniamo inoltre un'implementazione leggera e facile da usare di FastTD3 per accelerare la ricerca RL nella robotica.

English

Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.

FastTD3: Apprendimento per Rinforzo Semplice, Veloce ed Efficace per il Controllo di Umanoidi

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

Abstract

Support