ODYSSEY: 장기적 과제를 위한 오픈 월드 사족 보행체 탐색 및 조작

초록

언어 지시 장기간 이동 조작은 구체화된 의미론적 추론, 일반화 가능한 조작, 그리고 적응형 이동성 분야에서 오랫동안 큰 도전 과제로 여겨져 왔습니다. 이러한 진전을 가로막는 세 가지 근본적인 한계가 존재합니다: 첫째, 대규모 언어 모델이 의미론적 사전 지식을 통해 공간 추론과 작업 계획 능력을 향상시켰음에도 불구하고, 기존 구현들은 주로 탁상 시나리오에 국한되어 있어 이동 플랫폼의 제한된 인지 능력과 작동 범위를 해결하지 못하고 있습니다. 둘째, 현재의 조작 전략은 개방형 환경에서 마주치는 다양한 객체 구성에 충분히 일반화되지 못하고 있습니다. 셋째, 실질적인 배포에 있어 필수적인, 비정형 환경에서 높은 플랫폼 기동성과 정밀한 엔드 이펙터 제어를 동시에 요구하는 이중 요구 사항은 아직 충분히 연구되지 않았습니다. 본 연구에서는 매니퓰레이터가 장착된 민첩한 사족 보행 로봇을 위한 통합 이동 조작 프레임워크인 ODYSSEY를 제안합니다. 이 프레임워크는 상위 수준의 작업 계획과 하위 수준의 전신 제어를 원활하게 통합합니다. 언어 조건화 작업에서 자기 중심적 인지의 문제를 해결하기 위해, 우리는 비전-언어 모델로 구동되는 계층적 플래너를 도입하여 장기간 지시 분해와 정확한 작업 실행을 가능하게 합니다. 제어 수준에서는, 우리의 새로운 전신 정책이 도전적인 지형에서 견고한 조정을 달성합니다. 또한, 우리는 다양한 실내 및 실외 시나리오를 평가하는 첫 번째 장기간 이동 조작 벤치마크를 제시합니다. 시뮬레이션에서 실제로의 성공적인 전이를 통해, 우리는 이 시스템의 일반화 능력과 실제 배포에서의 견고성을 입증하며, 비정형 환경에서 다리형 매니퓰레이터의 실용성을 강조합니다. 우리의 연구는 복잡하고 동적인 작업을 수행할 수 있는 일반화된 로봇 보조자의 실현 가능성을 한 단계 더 진전시킵니다. 프로젝트 페이지: https://kaijwang.github.io/odyssey.github.io/

English

Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have improved spatial reasoning and task planning through semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied. In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination across challenging terrains. We further present the first benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system's generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks. Our project page: https://kaijwang.github.io/odyssey.github.io/

ODYSSEY: 장기적 과제를 위한 오픈 월드 사족 보행체 탐색 및 조작

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

초록

Support