FastTD3: 인간형 로봇 제어를 위한 단순하고 빠르며 강력한 강화 학습

초록

강화 학습(Reinforcement Learning, RL)은 로봇 공학 분야에서 상당한 진전을 이끌어 왔지만, 그 복잡성과 긴 학습 시간은 여전히 주요 병목 현상으로 남아 있습니다. 본 보고서에서는 HumanoidBench, IsaacLab, MuJoCo Playground와 같은 인기 있는 환경에서 휴머노이드 로봇의 학습 속도를 크게 향상시키는 간단하고 빠르며 강력한 RL 알고리즘인 FastTD3를 소개합니다. 우리의 접근법은 매우 단순합니다: 병렬 시뮬레이션, 대규모 배치 업데이트, 분포적 비평가(distributional critic), 그리고 신중하게 조정된 하이퍼파라미터를 포함한 몇 가지 수정 사항을 적용하여 오프-폴리시 TD3 에이전트를 학습시킵니다. FastTD3는 단일 A100 GPU에서 3시간 이내에 다양한 HumanoidBench 과제를 해결하며, 학습 중 안정성을 유지합니다. 또한, 로봇 공학 분야의 RL 연구를 가속화하기 위해 가볍고 사용하기 쉬운 FastTD3 구현체를 제공합니다.

English

Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.