Octo:一個開源的通用機器人策略
Octo: An Open-Source Generalist Robot Policy
May 20, 2024
作者: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine
cs.AI
摘要
基於多樣機器人數據集預訓練的大型策略具有改變機器人學習的潛力:不需從頭訓練新策略,這種通用機器人策略僅需少量領域內數據進行微調,就能廣泛泛化。然而,為了在各種機器人學習場景、環境和任務中廣泛應用,這些策略需要處理多樣的感測器和動作空間,適應各種常用的機器人平台,並且能夠快速有效地在新領域進行微調。在這項工作中,我們旨在為開發開源、廣泛應用的通用機器人操作策略奠定基礎。作為第一步,我們介紹了Octo,這是一個基於大型Transformer策略,從迄今為止最大的機器人操作數據集Open X-Embodiment數據集的800k條軌跡中訓練而成。它可以通過語言命令或目標圖像進行指導,並且可以在標準消費級GPU上在幾小時內有效地對具有新感測輸入和動作空間的機器人設置進行微調。在9個機器人平台的實驗中,我們展示了Octo作為一個多才多藝的策略初始化,可以有效地微調為新的觀察和動作空間。我們還對Octo模型的設計決策進行了詳細的消融分析,從架構到訓練數據,以指導未來建立通用機器人模型的研究。
English
Large policies pretrained on diverse robot datasets have the potential to
transform robotic learning: instead of training new policies from scratch, such
generalist robot policies may be finetuned with only a little in-domain data,
yet generalize broadly. However, to be widely applicable across a range of
robotic learning scenarios, environments, and tasks, such policies need to
handle diverse sensors and action spaces, accommodate a variety of commonly
used robotic platforms, and finetune readily and efficiently to new domains. In
this work, we aim to lay the groundwork for developing open-source, widely
applicable, generalist policies for robotic manipulation. As a first step, we
introduce Octo, a large transformer-based policy trained on 800k trajectories
from the Open X-Embodiment dataset, the largest robot manipulation dataset to
date. It can be instructed via language commands or goal images and can be
effectively finetuned to robot setups with new sensory inputs and action spaces
within a few hours on standard consumer GPUs. In experiments across 9 robotic
platforms, we demonstrate that Octo serves as a versatile policy initialization
that can be effectively finetuned to new observation and action spaces. We also
perform detailed ablations of design decisions for the Octo model, from
architecture to training data, to guide future research on building generalist
robot models.Summary
AI-Generated Summary