ChatPaper.aiChatPaper

RoboBrain 2.0 技术报告

RoboBrain 2.0 Technical Report

July 2, 2025
作者: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Shanyu Rong, Zhengliang Cai, Bolun Zhang, Shuyi Zhang, Huaihai Lyu, Mengfei Du, Lingfeng Zhang, Xi Feng, Xiaodan Liu, Yance Jiao, Chenrui He, Mengsi Lyu, Zhuo Chen, Yulong Ao, Xue Sun, Zheqi He, Jingshu Zheng, Xi Yang, Donghai Shi, Kunchang Xie, Bochao Zhang, Shaokai Nie, Chunlei Men, Yonghua Lin, Zhongyuan Wang, Tiejun Huang, Shanghang Zhang
cs.AI

摘要

我们推出RoboBrain 2.0,这是最新一代的具身视觉语言基础模型,旨在统一物理环境中复杂具身任务的感知、推理与规划能力。该模型提供两种版本:轻量级的7B模型和全规模的32B模型,采用视觉编码器与语言模型相结合的异构架构。尽管体积紧凑,RoboBrain 2.0在广泛的具身推理任务中展现出强劲性能。在空间与时间基准测试中,32B版本均取得领先成绩,超越了以往的开源及专有模型。特别地,它支持关键的现实世界具身AI能力,包括空间理解(如功能预测、空间指代、轨迹预测)和时间决策(如闭环交互、多智能体长时程规划、场景图更新)。本报告详述了模型架构、数据构建、多阶段训练策略、基础设施及实际应用。我们期望RoboBrain 2.0能推动具身AI研究,并为构建通用具身智能体迈出实用一步。代码、检查点及基准测试可在https://superrobobrain.github.io获取。
English
We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performance across a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32B variant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supports key real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatial referring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction, multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advances embodied AI research and serves as a practical step toward building generalist embodied agents. The code, checkpoint and benchmark are available at https://superrobobrain.github.io.
PDF151July 8, 2025