Embodied-R1.5：透過具身基礎模型演化物理智能

摘要

我們介紹 Embodied-R1.5，這是一個統一的具身基礎模型（EFM），將全面的具身推理能力（涵蓋具身認知、任務規劃、修正與指向）整合至單一架構中，朝向通用物理智能邁進。藉由三條自動化資料建構管線，顯著擴展關鍵能力的資料覆蓋範圍，我們建構了超過 150 億 token 的大規模資料系統，並設計了多任務平衡的強化學習配方，以緩解異質任務間的衝突。我們進一步引入了規劃器-接地器-修正器（PGC）閉環框架，使單一模型能夠在長時域任務中自主執行並自我修正。僅以 80 億參數，Embodied-R1.5 便在 24 個具身 VLM 基準測試中的 16 項達到最先進水準，超越了 Gemini-Robotics-ER-1.5 與 GPT-5.4 等領先模型。得益於內化的具身能力，Embodied-R1.5 只需少量資料即可微調為 VLA，在 4 個流行的操作基準套件中優於 π_{0.5} 等領先 VLA 模型。我們進一步進行了廣泛的零樣本真實機器人實驗，驗證了在指令跟隨、可供性接地、關節物體操作以及長時域複雜任務方面的效能，展現出對物理世界的強大泛化能力。我們開源了模型權重、資料集、訓練程式碼以及 EmbodiedEvalKit（一個專為具身任務量身打造的評估框架），以促進未來 EFM 的研究。

English

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.