Galactic: 並べ替えタスクのためのエンドツーエンド強化学習を10万ステップ/秒でスケーリング

要旨

本論文では、屋内環境におけるロボットの移動操作のための大規模シミュレーションおよび強化学習（RL）フレームワークであるGalacticを紹介します。具体的には、Fetchロボット（移動ベース、7自由度アーム、RGBDカメラ、自己運動、およびオンボードセンシングを装備）を家庭環境に配置し、物体を再配置するタスク（物体まで移動し、それを拾い上げ、目標位置まで移動し、物体を目標位置に置く）を実行させます。 Galacticは高速です。シミュレーション速度（レンダリング＋物理演算）に関して、Galacticは8GPUノードで421,000ステップ/秒（SPS）を達成し、Habitat 2.0（7,699 SPS）の54倍の速度です。さらに重要なことに、Galacticはレンダリング、物理演算、RLの相互作用全体を最適化するように設計されており、相互作用におけるボトルネックがトレーニングを遅くすることを防ぎます。シミュレーション＋RL速度（レンダリング＋物理演算＋推論＋学習）に関して、Galacticは108,000 SPSを達成し、Habitat 2.0（1,243 SPS）の88倍の速度です。これらの大幅な高速化により、既存の実験の実時間トレーニング時間が大幅に短縮されるだけでなく、前例のない規模の新しい実験が可能になります。まず、Galacticは移動ピックスキルを16分未満で80%以上の精度でトレーニングでき、Habitat 2.0で同じスキルをトレーニングするのに24時間以上かかるのと比べて100倍の高速化を実現します。次に、Galacticを使用して、46時間で50億ステップの経験を用いた再配置の最大規模の実験を実施し、これは20年間のロボット経験に相当します。このスケーリングにより、タスクに依存しないコンポーネントで構成された単一のニューラルネットワークがGeometricGoal再配置で85%の成功率を達成し、Habitat 2.0で報告された同じアプローチの0%の成功率と比較して大幅な改善を示しました。コードはgithub.com/facebookresearch/galacticで公開されています。

English

We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location. Galactic is fast. In terms of simulation speed (rendering + physics), Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was designed to optimize the entire rendering + physics + RL interplay since any bottleneck in the interplay slows down training. In terms of simulation+RL speed (rendering + physics + inference + learning), Galactic achieves over 108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS). These massive speed-ups not only drastically cut the wall-clock training time of existing experiments, but also unlock an unprecedented scale of new experiments. First, Galactic can train a mobile pick skill to >80% accuracy in under 16 minutes, a 100x speedup compared to the over 24 hours it takes to train the same skill in Habitat 2.0. Second, we use Galactic to perform the largest-scale experiment to date for rearrangement using 5B steps of experience in 46 hours, which is equivalent to 20 years of robot experience. This scaling results in a single neural network composed of task-agnostic components achieving 85% success in GeometricGoal rearrangement, compared to 0% success reported in Habitat 2.0 for the same approach. The code is available at github.com/facebookresearch/galactic.

Galactic: 並べ替えタスクのためのエンドツーエンド強化学習を10万ステップ/秒でスケーリング

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

要旨

Support