OmniEVA:基於任務自適應三維定位與具身感知推理的通用型具身規劃器
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
September 11, 2025
作者: Yuecheng Liu, Dafeng Chi, Shiguang Wu, Zhanguang Zhang, Yuzheng Zhuang, Bowen Yang, He Zhu, Lingfeng Zhang, Pengwei Xie, David Gamaliel Arcos Bravo, Yingxue Zhang, Jianye Hao, Xingyue Quan
cs.AI
摘要
近期,多模态大语言模型(MLLMs)的进展为具身智能开辟了新天地,使其能够实现多模态理解、推理与交互,以及持续的空间决策。然而,当前基于MLLM的具身系统面临两大关键局限。首先,几何适应性差距:仅依赖二维输入训练或硬编码三维几何注入的模型,要么空间信息不足,要么二维泛化受限,导致在应对多样空间需求的任务时适应性欠佳。其次,具身约束差距:先前研究常忽视真实机器人的物理限制与能力,致使任务计划虽理论上可行却实际难以执行。为弥合这些差距,我们推出了OmniEVA——一款具身多功能规划器,通过两项核心创新实现高级具身推理与任务规划:(1)任务自适应的三维基础机制,引入门控路由器,依据上下文需求对三维融合进行显式选择性调控,为多样具身任务提供情境感知的三维基础。(2)具身感知推理框架,将任务目标与具身约束共同纳入推理循环,生成既目标导向又可执行的规划决策。大量实验结果表明,OmniEVA不仅在通用具身推理性能上达到业界领先水平,还在广泛的下游场景中展现出强大能力。对一系列提出的具身基准测试的评估,包括基础与复合任务,均证实了其稳健且多功能的规划能力。项目页面:https://omnieva.github.io
English
Recent advances in multimodal large language models (MLLMs) have opened new
opportunities for embodied intelligence, enabling multimodal understanding,
reasoning, and interaction, as well as continuous spatial decision-making.
Nevertheless, current MLLM-based embodied systems face two critical
limitations. First, Geometric Adaptability Gap: models trained solely on 2D
inputs or with hard-coded 3D geometry injection suffer from either insufficient
spatial information or restricted 2D generalization, leading to poor
adaptability across tasks with diverse spatial demands. Second, Embodiment
Constraint Gap: prior work often neglects the physical constraints and
capacities of real robots, resulting in task plans that are theoretically valid
but practically infeasible.To address these gaps, we introduce OmniEVA -- an
embodied versatile planner that enables advanced embodied reasoning and task
planning through two pivotal innovations: (1) a Task-Adaptive 3D Grounding
mechanism, which introduces a gated router to perform explicit selective
regulation of 3D fusion based on contextual requirements, enabling
context-aware 3D grounding for diverse embodied tasks. (2) an Embodiment-Aware
Reasoning framework that jointly incorporates task goals and embodiment
constraints into the reasoning loop, resulting in planning decisions that are
both goal-directed and executable. Extensive experimental results demonstrate
that OmniEVA not only achieves state-of-the-art general embodied reasoning
performance, but also exhibits a strong ability across a wide range of
downstream scenarios. Evaluations of a suite of proposed embodied benchmarks,
including both primitive and composite tasks, confirm its robust and versatile
planning capabilities. Project page: https://omnieva.github.io