RynnBrain：開放式具身基礎模型

摘要

儘管多模態基礎模型快速發展，具身智能領域仍缺乏一個能在真實世界時空動態中整合感知、推理與規劃的統一物理基礎模型。我們推出RynnBrain——一個開源時空基礎模型，專為具身智能設計。該模型在統一框架下強化四大核心能力：全面的自我中心理解、多樣化時空定位、物理基礎推理及物理感知規劃。RynnBrain系列包含三種基礎模型規模（2B、8B與30B-A3B MoE）及四種針對下游具身任務（即RynnBrain-Nav、RynnBrain-Plan與RynnBrain-VLA）或複雜空間推理任務（即RynnBrain-CoP）微調的後訓練變體。在對20個具身基準與8個通用視覺理解基準的廣泛評估中，RynnBrain基礎模型以顯著優勢大幅超越現有具身基礎模型。其後訓練模型組進一步驗證了RynnBrain基礎模型的兩大潛力：（一）實現物理基礎的推理與規劃；（二）作為可高效適配多樣具身任務的強預訓練骨幹。

English

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.

RynnBrain：開放式具身基礎模型

RynnBrain: Open Embodied Foundation Models

摘要

Support