ChatPaper.aiChatPaper

RoboBrain 2.5:视觉显深度,思维具时序

RoboBrain 2.5: Depth in Sight, Time in Mind

January 20, 2026
作者: Huajie Tan, Enshen Zhou, Zhiyu Li, Yijie Xu, Yuheng Ji, Xiansheng Chen, Cheng Chi, Pengwei Wang, Huizhu Jia, Yulong Ao, Mingyu Cao, Sixiang Chen, Zhe Li, Mengzhen Liu, Zixiao Wang, Shanyu Rong, Yaoxu Lyu, Zhongxia Zhao, Peterson Co, Yibo Li, Yi Han, Shaoxuan Xie, Guocai Yao, Songjing Wang, Leiduo Zhang, Xi Yang, Yance Jiao, Donghai Shi, Kunchang Xie, Shaokai Nie, Chunlei Men, Yonghua Lin, Zhongyuan Wang, Tiejun Huang, Shanghang Zhang
cs.AI

摘要

我们推出新一代具身人工智能基础模型RoboBrain 2.5,该模型通过高质量时空监督的大规模训练,实现了通用感知、空间推理与时间建模的突破性进展。在上一代基础上,RoboBrain 2.5带来两大核心能力升级:其一是开启精确三维空间推理能力,通过从二维像素相对定位转向深度感知坐标预测与绝对度量约束理解,在物理约束下生成完整的三维操作轨迹作为有序关键点序列;其二是建立稠密时间价值估计机制,通过跨视角的步进感知进度预测与执行状态理解,为下游学习提供稳定的反馈信号。这两项升级共同推动框架向更具物理基础和执行意识的具身智能演进,以应对复杂精细的操作任务。代码与模型权重已发布于项目网站:https://superrobobrain.github.io
English
We introduce RoboBrain 2.5, a next-generation embodied AI foundation model that advances general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal supervision. Building upon its predecessor, RoboBrain 2.5 introduces two major capability upgrades. Specifically, it unlocks Precise 3D Spatial Reasoning by shifting from 2D pixel-relative grounding to depth-aware coordinate prediction and absolute metric constraint comprehension, generating complete 3D manipulation traces as ordered keypoint sequences under physical constraints. Complementing this spatial precision, the model establishes Dense Temporal Value Estimation that provides dense, step-aware progress prediction and execution state understanding across varying viewpoints, producing stable feedback signals for downstream learning. Together, these upgrades extend the framework toward more physically grounded and execution-aware embodied intelligence for complex, fine-grained manipulation. The code and checkpoints are available at project website: https://superrobobrain.github.io
PDF60January 23, 2026