具身-R1.5:通过具身基础模型进化物理智能
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
June 9, 2026
作者: Yifu Yuan, Yaoting Huang, Xianze Yao, Yutong Li, Shuoheng Zhang, Linqi Han, Pengyi Li, Jiangeng Sun, Wenting Jia, Zhao Zhang, Yuhao Liu, Ruihao Liao, Yucheng Hu, Qiyu Wu, Yuxiao Li, Zibin Dong, Fei Ni, Yan Zheng, Shuyang Gu, Yi Ma, Hongyao Tang, Han Hu, Jianye Hao
cs.AI
摘要
我们提出了Embodied-R1.5,一种统一的具身基础模型(EFM),该模型将全面的具身推理能力——涵盖具身认知、任务规划、纠正与指向——集成在单一架构中,向通用物理智能迈进。通过利用三条自动化数据构建流水线显著扩展关键能力的数据覆盖范围,我们构建了超过150亿token的大规模数据系统,并设计了多任务平衡的强化学习方案以缓解异构任务冲突。我们进一步引入规划器-接地器-校正器(PGC)闭环框架,使单一模型能够自主执行并自我纠正在长时域任务中的操作。仅凭80亿参数,Embodied-R1.5在24个具身视觉语言模型基准测试中的16项上达到最优性能,超越了Gemini-Robotics-ER-1.5和GPT-5.4等领先模型。得益于内化的具身能力,Embodied-R1.5仅需少量数据即可微调为视觉语言动作模型(VLA),在4个主流操作基准套件上优于π_{0.5}等领先VLA模型。我们还进行了广泛的零样本真实机器人实验,验证了其在指令遵循、可供性定位、铰接物体操作及长时域复杂任务中的表现,展现出对物理世界的强大泛化能力。我们开源了模型权重、数据集、训练代码以及专为具身任务设计的评估框架EmbodiedEvalKit,以促进未来在具身基础模型领域的研究。
English
We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.