Galactic: 재배치를 위한 종단 간 강화 학습을 초당 10만 스텝으로 확장

초록

우리는 실내 환경에서의 로봇 이동 조작을 위한 대규모 시뮬레이션 및 강화학습(RL) 프레임워크인 Galactic을 소개합니다. 구체적으로, Fetch 로봇(모바일 베이스, 7자유도 암, RGBD 카메라, 자체 운동 및 온보드 센싱 장비를 갖춘)이 가정 환경에 생성되고, 물체를 재배치하라는 명령을 받습니다. 이는 물체로 이동하여 집어들고, 목표 위치로 이동한 후 물체를 목표 위치에 놓는 과정을 포함합니다. Galactic은 빠릅니다. 시뮬레이션 속도(렌더링 + 물리) 측면에서, Galactic은 8-GPU 노드에서 초당 421,000 스텝(SPS)을 달성하며, 이는 Habitat 2.0(7699 SPS)보다 54배 빠른 속도입니다. 더 중요한 것은, Galactic은 렌더링 + 물리 + RL의 전체 상호작용을 최적화하도록 설계되었습니다. 이러한 상호작용에서 발생하는 병목 현상은 훈련 속도를 저하시키기 때문입니다. 시뮬레이션+RL 속도(렌더링 + 물리 + 추론 + 학습) 측면에서, Galactic은 초당 108,000 스텝을 달성하며, 이는 Habitat 2.0(1243 SPS)보다 88배 빠른 속도입니다. 이러한 대규모 속도 향상은 기존 실험의 벽시계 훈련 시간을 크게 단축할 뿐만 아니라, 전례 없는 규모의 새로운 실험을 가능하게 합니다. 첫째, Galactic은 모바일 피킹 기술을 16분 이내에 80% 이상의 정확도로 훈련할 수 있으며, 이는 Habitat 2.0에서 동일한 기술을 훈련하는 데 걸리는 24시간 이상에 비해 100배 빠른 속도입니다. 둘째, 우리는 Galactic을 사용하여 지금까지 가장 큰 규모의 재배치 실험을 수행했습니다. 이 실험은 46시간 동안 50억 스텝의 경험을 사용하며, 이는 20년 분량의 로봇 경험에 해당합니다. 이러한 스케일링은 작업에 구애받지 않는 구성 요소로 이루어진 단일 신경망이 GeometricGoal 재배치에서 85%의 성공률을 달성하게 했으며, 이는 동일한 접근 방식으로 Habitat 2.0에서 보고된 0%의 성공률과 대조적입니다. 코드는 github.com/facebookresearch/galactic에서 확인할 수 있습니다.

English

We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location. Galactic is fast. In terms of simulation speed (rendering + physics), Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was designed to optimize the entire rendering + physics + RL interplay since any bottleneck in the interplay slows down training. In terms of simulation+RL speed (rendering + physics + inference + learning), Galactic achieves over 108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS). These massive speed-ups not only drastically cut the wall-clock training time of existing experiments, but also unlock an unprecedented scale of new experiments. First, Galactic can train a mobile pick skill to >80% accuracy in under 16 minutes, a 100x speedup compared to the over 24 hours it takes to train the same skill in Habitat 2.0. Second, we use Galactic to perform the largest-scale experiment to date for rearrangement using 5B steps of experience in 46 hours, which is equivalent to 20 years of robot experience. This scaling results in a single neural network composed of task-agnostic components achieving 85% success in GeometricGoal rearrangement, compared to 0% success reported in Habitat 2.0 for the same approach. The code is available at github.com/facebookresearch/galactic.

Galactic: 재배치를 위한 종단 간 강화 학습을 초당 10만 스텝으로 확장

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

초록

Support