EgoZero: スマートグラスからのロボット学習

要旨

汎用ロボット工学における最近の進展にもかかわらず、ロボットのポリシーは現実世界における基本的な人間の能力に大きく遅れをとっている。人間は物理世界と絶えず相互作用しているが、この豊富なデータ資源はロボット学習においてほとんど活用されていない。本研究では、Project Ariaスマートグラスで捕捉された人間のデモンストレーションからロバストな操作ポリシーを学習し、ロボットデータを一切使用しない最小限のシステム「EgoZero」を提案する。EgoZeroは以下の機能を実現する：(1) 野外でのエゴセントリックな人間のデモンストレーションから、ロボットが実行可能な完全なアクションを抽出、(2) 人間の視覚観察を形態に依存しない状態表現に圧縮、(3) 形態的、空間的、意味的に一般化可能な閉ループポリシー学習。EgoZeroのポリシーをグリッパー付きFranka Pandaロボットに適用し、7つの操作タスクにおいて70%の成功率でゼロショット転移を実証した。各タスクのデータ収集時間はわずか20分である。我々の結果は、野外での人間データが現実世界のロボット学習のためのスケーラブルな基盤となり得ることを示唆しており、ロボットのための豊富で多様かつ自然な訓練データの未来への道を開くものである。コードと動画はhttps://egozero-robot.github.ioで公開されている。

English

Despite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world. Humans interact constantly with the physical world, yet this rich data resource remains largely untapped in robot learning. We propose EgoZero, a minimal system that learns robust manipulation policies from human demonstrations captured with Project Aria smart glasses, and zero robot data. EgoZero enables: (1) extraction of complete, robot-executable actions from in-the-wild, egocentric, human demonstrations, (2) compression of human visual observations into morphology-agnostic state representations, and (3) closed-loop policy learning that generalizes morphologically, spatially, and semantically. We deploy EgoZero policies on a gripper Franka Panda robot and demonstrate zero-shot transfer with 70% success rate over 7 manipulation tasks and only 20 minutes of data collection per task. Our results suggest that in-the-wild human data can serve as a scalable foundation for real-world robot learning - paving the way toward a future of abundant, diverse, and naturalistic training data for robots. Code and videos are available at https://egozero-robot.github.io.

EgoZero: スマートグラスからのロボット学習

EgoZero: Robot Learning from Smart Glasses

要旨

Support