ChatPaper.aiChatPaper

Embodied-R1.5:透過具身基礎模型演化物理智能

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

June 9, 2026
作者: Yifu Yuan, Yaoting Huang, Xianze Yao, Yutong Li, Shuoheng Zhang, Linqi Han, Pengyi Li, Jiangeng Sun, Wenting Jia, Zhao Zhang, Yuhao Liu, Ruihao Liao, Yucheng Hu, Qiyu Wu, Yuxiao Li, Zibin Dong, Fei Ni, Yan Zheng, Shuyang Gu, Yi Ma, Hongyao Tang, Han Hu, Jianye Hao
cs.AI

摘要

我們介紹 Embodied-R1.5,這是一個統一的具身基礎模型(EFM),將全面的具身推理能力(涵蓋具身認知、任務規劃、修正與指向)整合至單一架構中,朝向通用物理智能邁進。藉由三條自動化資料建構管線,顯著擴展關鍵能力的資料覆蓋範圍,我們建構了超過 150 億 token 的大規模資料系統,並設計了多任務平衡的強化學習配方,以緩解異質任務間的衝突。我們進一步引入了規劃器-接地器-修正器(PGC)閉環框架,使單一模型能夠在長時域任務中自主執行並自我修正。僅以 80 億參數,Embodied-R1.5 便在 24 個具身 VLM 基準測試中的 16 項達到最先進水準,超越了 Gemini-Robotics-ER-1.5 與 GPT-5.4 等領先模型。得益於內化的具身能力,Embodied-R1.5 只需少量資料即可微調為 VLA,在 4 個流行的操作基準套件中優於 π_{0.5} 等領先 VLA 模型。我們進一步進行了廣泛的零樣本真實機器人實驗,驗證了在指令跟隨、可供性接地、關節物體操作以及長時域複雜任務方面的效能,展現出對物理世界的強大泛化能力。我們開源了模型權重、資料集、訓練程式碼以及 EmbodiedEvalKit(一個專為具身任務量身打造的評估框架),以促進未來 EFM 的研究。
English
We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.