RynnBrain：开放式具身基础模型

摘要

尽管多模态基础模型发展迅速，具身智能领域仍缺乏一个统一且基于物理实境的基础模型，能够将感知、推理与规划整合于真实世界的时空动态中。我们推出RynnBrain——一个面向具身智能的开源时空基础模型。该模型在统一框架下强化四大核心能力：全面的自我中心理解、多样化时空定位、物理接地推理及物理感知规划。RynnBrain系列包含三种基础模型规模（2B、8B和30B-A3B MoE）以及四个针对下游具身任务（即RynnBrain-Nav、RynnBrain-Plan和RynnBrain-VLA）或复杂空间推理任务（即RynnBrain-CoP）进行后训练的变体。在20个具身基准测试和8个通用视觉理解基准上的广泛评估表明，我们的RynnBrain基础模型以显著优势大幅超越现有具身基础模型。后训练模型套件进一步验证了RynnBrain基础模型的两大潜力：（一）实现物理接地的推理与规划；（二）作为强预训练骨干网络，可高效适配多样化具身任务。

English

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.