ChatPaper.aiChatPaper

星系:在每秒 100k 步的速度下擴展端對端重新排列的強化學習

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

June 13, 2023
作者: Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander
cs.AI

摘要

我們介紹了Galactic,這是一個用於室內環境中機器人移動操作的大規模模擬和強化學習(RL)框架。具體來說,我們在一個家庭環境中生成了一個配備移動底盤、7DoF機械臂、RGBD相機、自我運動和板載感知的Fetch機器人,並要求它重新排列物體 - 通過導航到一個物體、拾取它、導航到目標位置,然後將物體放置在目標位置上。 Galactic速度快。在模擬速度(渲染+物理)方面,Galactic在一個8-GPU節點上實現了超過421,000步/秒(SPS),比Habitat 2.0(7699 SPS)快54倍。更重要的是,Galactic旨在優化整個渲染+物理+RL互動,因為互動中的任何瓶頸都會拖慢訓練速度。在模擬+RL速度(渲染+物理+推理+學習)方面,Galactic實現了超過108,000 SPS,比Habitat 2.0(1243 SPS)快88倍。 這些巨大的加速不僅大幅縮短了現有實驗的牆鐘訓練時間,還開啟了一個前所未有的新實驗規模。首先,Galactic可以在不到16分鐘內訓練出超過80%準確度的移動拾取技能,這比在Habitat 2.0中訓練相同技能需要超過24小時快了100倍。其次,我們使用Galactic在46小時內進行了迄今為止規模最大的重新排列實驗,使用了50億步的經驗,相當於20年的機器人經驗。這種規模化結果是一個由任務不可知組件組成的單一神經網絡在幾何目標重新排列方面實現了85%的成功率,而在Habitat 2.0中相同方法報告的成功率為0%。代碼可在github.com/facebookresearch/galactic找到。
English
We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location. Galactic is fast. In terms of simulation speed (rendering + physics), Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was designed to optimize the entire rendering + physics + RL interplay since any bottleneck in the interplay slows down training. In terms of simulation+RL speed (rendering + physics + inference + learning), Galactic achieves over 108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS). These massive speed-ups not only drastically cut the wall-clock training time of existing experiments, but also unlock an unprecedented scale of new experiments. First, Galactic can train a mobile pick skill to >80% accuracy in under 16 minutes, a 100x speedup compared to the over 24 hours it takes to train the same skill in Habitat 2.0. Second, we use Galactic to perform the largest-scale experiment to date for rearrangement using 5B steps of experience in 46 hours, which is equivalent to 20 years of robot experience. This scaling results in a single neural network composed of task-agnostic components achieving 85% success in GeometricGoal rearrangement, compared to 0% success reported in Habitat 2.0 for the same approach. The code is available at github.com/facebookresearch/galactic.
PDF30December 15, 2024