跨体学习的规模化:一种适用于操作、导航、运动和航空的统一策略
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
August 21, 2024
作者: Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine
cs.AI
摘要
现代机器学习系统依赖大型数据集来实现广泛泛化,这在机器人学习中常常构成挑战,因为每个机器人平台和任务可能只有一个小数据集。通过在许多不同种类的机器人上训练单一策略,机器人学习方法可以利用更广泛和多样化的数据集,从而实现更好的泛化和鲁棒性。然而,在多机器人数据上训练单一策略具有挑战性,因为机器人的传感器、执行器和控制频率可能差异很大。我们提出CrossFormer,这是一种可扩展且灵活的基于Transformer的策略,可以处理任何具象的数据。我们在迄今为止最大和最多样化的数据集上训练CrossFormer,包括20种不同机器人具象的90万条轨迹。我们展示了相同的网络权重可以控制非常不同的机器人,包括单臂和双臂操纵系统、轮式机器人、四轴飞行器和四足动物。与先前的工作不同,我们的模型不需要手动对齐观测或行动空间。在现实世界中进行的大量实验表明,我们的方法与为每种具象量身定制的专家策略的性能相匹配,同时在跨具象学习方面明显优于先前的最新技术水平。
English
Modern machine learning systems rely on large datasets to attain broad
generalization, and this often poses a challenge in robot learning, where each
robotic platform and task might have only a small dataset. By training a single
policy across many different kinds of robots, a robot learning method can
leverage much broader and more diverse datasets, which in turn can lead to
better generalization and robustness. However, training a single policy on
multi-robot data is challenging because robots can have widely varying sensors,
actuators, and control frequencies. We propose CrossFormer, a scalable and
flexible transformer-based policy that can consume data from any embodiment. We
train CrossFormer on the largest and most diverse dataset to date, 900K
trajectories across 20 different robot embodiments. We demonstrate that the
same network weights can control vastly different robots, including single and
dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds.
Unlike prior work, our model does not require manual alignment of the
observation or action spaces. Extensive experiments in the real world show that
our method matches the performance of specialist policies tailored for each
embodiment, while also significantly outperforming the prior state of the art
in cross-embodiment learning.Summary
AI-Generated Summary