ODYSSEY: 長期タスクのためのオープンワールド四足歩行ロボットの探索と操作

要旨

言語誘導型の長期的モバイルマニピュレーションは、具現化された意味推論、汎用的なマニピュレーション、適応的な移動において長らく大きな課題とされてきました。この進歩を妨げる3つの根本的な制約があります。第一に、大規模言語モデルが意味論的先行知識を通じて空間推論とタスク計画を改善したものの、既存の実装はテーブルトップシナリオに限定されており、モバイルプラットフォームの制約された知覚と限られた動作範囲に対応できていません。第二に、現在のマニピュレーション戦略は、オープンワールド環境で遭遇する多様なオブジェクト構成に対して十分な汎化性能を示していません。第三に、実用的な展開において重要であるにもかかわらず、非構造化環境下での高いプラットフォーム機動性と精密なエンドエフェクタ制御の両立という要件は十分に研究されていません。本研究では、マニピュレータを装備した敏捷な四足歩行ロボットのための統合モバイルマニピュレーションフレームワーク「ODYSSEY」を提案します。このフレームワークは、高レベルのタスク計画と低レベルの全身制御をシームレスに統合します。言語条件付きタスクにおける自己中心的な知覚の課題に対処するため、視覚言語モデルを活用した階層型プランナーを導入し、長期的な指示分解と精密な動作実行を可能にします。制御レベルでは、新たな全身ポリシーが困難な地形での堅牢な協調を実現します。さらに、長期的モバイルマニピュレーションのための初のベンチマークを提示し、多様な屋内および屋外シナリオを評価します。シミュレーションから実世界への転移に成功し、非構造化環境における脚式マニピュレータの実用性を強調しながら、システムの汎化性能と堅牢性を実世界展開で実証します。本研究は、複雑で動的なタスクを実行可能な汎用ロボットアシスタントの実現可能性を前進させます。プロジェクトページ: https://kaijwang.github.io/odyssey.github.io/

English

Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have improved spatial reasoning and task planning through semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied. In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination across challenging terrains. We further present the first benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system's generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks. Our project page: https://kaijwang.github.io/odyssey.github.io/

ODYSSEY: 長期タスクのためのオープンワールド四足歩行ロボットの探索と操作

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

要旨

Support