RynnBrain:开放式具身基础模型
RynnBrain: Open Embodied Foundation Models
February 13, 2026
作者: Ronghao Dang, Jiayan Guo, Bohan Hou, Sicong Leng, Kehan Li, Xin Li, Jiangpin Liu, Yunxuan Mao, Zhikai Wang, Yuqian Yuan, Minghao Zhu, Xiao Lin, Yang Bai, Qian Jiang, Yaxi Zhao, Minghua Zeng, Junlong Gao, Yuming Jiang, Jun Cen, Siteng Huang, Liuyi Wang, Wenqiao Zhang, Chengju Liu, Jianfei Yang, Shijian Lu, Deli Zhao
cs.AI
摘要
尽管多模态基础模型发展迅速,具身智能领域仍缺乏一个统一且基于物理实境的基础模型,能够将感知、推理与规划整合于真实世界的时空动态中。我们推出RynnBrain——一个面向具身智能的开源时空基础模型。该模型在统一框架下强化四大核心能力:全面的自我中心理解、多样化时空定位、物理接地推理及物理感知规划。RynnBrain系列包含三种基础模型规模(2B、8B和30B-A3B MoE)以及四个针对下游具身任务(即RynnBrain-Nav、RynnBrain-Plan和RynnBrain-VLA)或复杂空间推理任务(即RynnBrain-CoP)进行后训练的变体。在20个具身基准测试和8个通用视觉理解基准上的广泛评估表明,我们的RynnBrain基础模型以显著优势大幅超越现有具身基础模型。后训练模型套件进一步验证了RynnBrain基础模型的两大潜力:(一)实现物理接地的推理与规划;(二)作为强预训练骨干网络,可高效适配多样化具身任务。
English
Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.