크로스 에모디드 학습 확장: 조작, 내비게이션, 이동 및 항공을 위한 하나의 정책

초록

현대 기계 학습 시스템은 광범위한 일반화를 달성하기 위해 대규모 데이터셋에 의존하며, 이는 종종 각 로봇 플랫폼과 작업이 작은 데이터셋만 가지고 있는 로봇 학습에서 어려움을 일으킵니다. 여러 종류의 로봇에 걸쳐 단일 정책을 훈련시킴으로써 로봇 학습 방법은 훨씬 더 넓고 다양한 데이터셋을 활용할 수 있으며, 결과적으로 더 나은 일반화와 견고성을 이끌어낼 수 있습니다. 그러나 여러 로봇 데이터에 대해 단일 정책을 훈련하는 것은 로봇의 센서, 액추에이터 및 제어 주파수가 크게 다를 수 있기 때문에 도전적입니다. 우리는 CrossFormer를 제안합니다. 이는 어떤 구현체에서도 데이터를 처리할 수 있는 확장 가능하고 유연한 트랜스포머 기반 정책입니다. 우리는 CrossFormer를 20가지 다른 로봇 구현체에 걸쳐 900K개의 궤적으로 구성된 지금까지 가장 크고 다양한 데이터셋으로 훈련합니다. 우리는 동일한 네트워크 가중치가 단일 및 이중 팔 조작 시스템, 바퀴 달린 로봇, 쿼드콥터 및 사발다리를 포함한 매우 다른 로봇을 제어할 수 있음을 보여줍니다. 이전 작업과는 달리, 우리의 모델은 관측 또는 행동 공간의 수동 정렬이 필요하지 않습니다. 현실 세계에서의 광범위한 실험 결과는 우리의 방법이 각 구현체에 맞춤화된 전문가 정책의 성능과 일치하면서도 구현체 간 학습의 최신 기술 수준을 크게 능가한다는 것을 보여줍니다.

English

Modern machine learning systems rely on large datasets to attain broad generalization, and this often poses a challenge in robot learning, where each robotic platform and task might have only a small dataset. By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization and robustness. However, training a single policy on multi-robot data is challenging because robots can have widely varying sensors, actuators, and control frequencies. We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment. We train CrossFormer on the largest and most diverse dataset to date, 900K trajectories across 20 different robot embodiments. We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds. Unlike prior work, our model does not require manual alignment of the observation or action spaces. Extensive experiments in the real world show that our method matches the performance of specialist policies tailored for each embodiment, while also significantly outperforming the prior state of the art in cross-embodiment learning.