星系:以每秒10万步的速度扩展端到端的重新排列强化学习
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
June 13, 2023
作者: Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander
cs.AI
摘要
我们介绍了Galactic,这是一个用于室内环境中机器人移动操作的大规模模拟和强化学习(RL)框架。具体来说,一个Fetch机器人(配备移动底座、7DoF机械臂、RGBD摄像头、自我运动和板载传感器)被放置在家庭环境中,并被要求重新排列物体 - 通过导航到一个物体,捡起它,导航到目标位置,然后将物体放置在目标位置上。
Galactic速度很快。在模拟速度方面(渲染+物理),Galactic在8-GPU节点上实现了超过421,000步/秒(SPS),比Habitat 2.0(7699 SPS)快54倍。更重要的是,Galactic被设计为优化整个渲染+物理+RL相互作用,因为相互作用中的任何瓶颈都会减慢训练速度。在模拟+RL速度方面(渲染+物理+推理+学习),Galactic实现了超过108,000 SPS,比Habitat 2.0(1243 SPS)快88倍。
这些巨大的加速不仅大大缩短了现有实验的挂钟训练时间,还开启了前所未有的新实验规模。首先,Galactic可以在不到16分钟内将移动拾取技能训练到>80%的准确率,这比在Habitat 2.0中训练相同技能需要超过24小时快了100倍。其次,我们使用Galactic在46小时内执行了迄今为止规模最大的重新排列实验,使用了5B步的经验,相当于20年的机器人经验。这种扩展导致了一个由任务不可知组件组成的单一神经网络在几何目标重新排列中实现了85%的成功率,而在Habitat 2.0中对于相同方法报告的成功率为0%。代码可在github.com/facebookresearch/galactic找到。
English
We present Galactic, a large-scale simulation and reinforcement-learning (RL)
framework for robotic mobile manipulation in indoor environments. Specifically,
a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion,
and onboard sensing) is spawned in a home environment and asked to rearrange
objects - by navigating to an object, picking it up, navigating to a target
location, and then placing the object at the target location.
Galactic is fast. In terms of simulation speed (rendering + physics),
Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which
is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was
designed to optimize the entire rendering + physics + RL interplay since any
bottleneck in the interplay slows down training. In terms of simulation+RL
speed (rendering + physics + inference + learning), Galactic achieves over
108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS).
These massive speed-ups not only drastically cut the wall-clock training time
of existing experiments, but also unlock an unprecedented scale of new
experiments. First, Galactic can train a mobile pick skill to >80% accuracy in
under 16 minutes, a 100x speedup compared to the over 24 hours it takes to
train the same skill in Habitat 2.0. Second, we use Galactic to perform the
largest-scale experiment to date for rearrangement using 5B steps of experience
in 46 hours, which is equivalent to 20 years of robot experience. This scaling
results in a single neural network composed of task-agnostic components
achieving 85% success in GeometricGoal rearrangement, compared to 0% success
reported in Habitat 2.0 for the same approach. The code is available at
github.com/facebookresearch/galactic.