ChatPaper.aiChatPaper

ODYSSEY:開放世界四足機器人探索與操作,實現長時程任務

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

August 11, 2025
作者: Kaijun Wang, Liqin Lu, Mingyu Liu, Jianuo Jiang, Zeju Li, Bolin Zhang, Wancai Zheng, Xinyi Yu, Hao Chen, Chunhua Shen
cs.AI

摘要

語言引導的長時程移動操作長期以來一直是具身語義推理、可泛化操作和適應性運動中的重大挑戰。三大根本性限制阻礙了進展:首先,儘管大型語言模型通過語義先驗提升了空間推理和任務規劃能力,現有實現仍局限於桌面場景,未能解決移動平台受限的感知能力和有限的操作範圍。其次,當面對開放世界環境中多樣的物體配置時,現有的操作策略表現出不足的泛化能力。第三,雖然對於實際部署至關重要,但在非結構化環境中同時保持高平台機動性和精確末端執行器控制的雙重要求仍未得到充分研究。 在本工作中,我們提出了ODYSSEY,一個為配備機械臂的敏捷四足機器人設計的統一移動操作框架,它無縫整合了高層次任務規劃與低層次全身控制。為應對語言條件任務中自我中心感知的挑戰,我們引入了一個由視覺-語言模型驅動的分層規劃器,實現了長時程指令分解和精確動作執行。在控制層面,我們新穎的全身策略實現了在複雜地形上的穩健協調。我們進一步提出了首個長時程移動操作的基準測試,評估了多樣的室內外場景。通過成功的模擬到現實的轉移,我們展示了系統在實際部署中的泛化能力和魯棒性,強調了腿式機械臂在非結構化環境中的實用性。我們的工作推動了能夠執行複雜動態任務的通用機器人助手的可行性。我們的項目頁面:https://kaijwang.github.io/odyssey.github.io/
English
Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have improved spatial reasoning and task planning through semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied. In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination across challenging terrains. We further present the first benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system's generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks. Our project page: https://kaijwang.github.io/odyssey.github.io/
PDF392August 25, 2025