跨體學習的擴展：一個適用於操作、導航、移動和飛行的政策

摘要

現代機器學習系統依賴大型數據集以實現廣泛泛化，這在機器人學習中常常帶來挑戰，因為每個機器人平台和任務可能只有一個小數據集。通過在許多不同類型的機器人上訓練單一策略，機器人學習方法可以利用更廣泛和多樣的數據集，進而帶來更好的泛化性和韌性。然而，在多機器人數據上訓練單一策略具有挑戰性，因為機器人的感應器、執行器和控制頻率可能差異很大。我們提出CrossFormer，這是一種可擴展且靈活的基於Transformer的策略，可以處理任何實體的數據。我們在迄今為止最大且最多樣化的數據集上訓練CrossFormer，其中包括20種不同的機器人實體的90萬條軌跡。我們展示了相同的網絡權重可以控制截然不同的機器人，包括單臂和雙臂操作系統、輪式機器人、四軸飛行器和四足動物。與以往的工作不同，我們的模型不需要手動對齊觀察或行動空間。在現實世界中進行的大量實驗表明，我們的方法與針對每個實體量身定制的專家策略的性能相匹配，同時在跨實體學習方面明顯優於以往的最先進技術。

English

Modern machine learning systems rely on large datasets to attain broad generalization, and this often poses a challenge in robot learning, where each robotic platform and task might have only a small dataset. By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization and robustness. However, training a single policy on multi-robot data is challenging because robots can have widely varying sensors, actuators, and control frequencies. We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment. We train CrossFormer on the largest and most diverse dataset to date, 900K trajectories across 20 different robot embodiments. We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds. Unlike prior work, our model does not require manual alignment of the observation or action spaces. Extensive experiments in the real world show that our method matches the performance of specialist policies tailored for each embodiment, while also significantly outperforming the prior state of the art in cross-embodiment learning.

跨體學習的擴展：一個適用於操作、導航、移動和飛行的政策

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

摘要

Support