MolmoSpaces:面向机器人导航与操作的大规模开放生态系统
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation
February 11, 2026
作者: Yejin Kim, Wilbert Pumacay, Omar Rayyan, Max Argus, Winson Han, Eli VanderBilt, Jordi Salvador, Abhay Deshpande, Rose Hendrix, Snehal Jauhri, Shuo Liu, Nur Muhammad Mahi Shafiullah, Maya Guru, Ainaz Eftekhar, Karen Farley, Donovan Clay, Jiafei Duan, Arjun Guru, Piper Wolters, Alvaro Herrasti, Ying-Chun Lee, Georgia Chalvatzaki, Yuchen Cui, Ali Farhadi, Dieter Fox, Ranjay Krishna
cs.AI
摘要
大规模部署机器人需要应对日常场景中的长尾问题。现实环境中场景布局、物体几何形态和任务要求的无限变化极为复杂,而现有机器人基准测试对此类情况覆盖不足。衡量这种泛化能力需要规模和多样性均超越实体评估极限的基础设施。我们推出MolmoSpaces——一个完全开放的生态系统,支持机器人策略的大规模基准测试。该系统包含23万多个多样化室内环境,涵盖手工打造的家庭场景到程序化生成的多房间住宅,配置13万个带丰富标注的物体资产,其中包含4.8万个可操作物体及4200万个稳定抓取位。关键突破在于这些环境与模拟器无关,支持MuJoCo、Isaac和ManiSkill等主流平台。该生态系统覆盖具身智能全任务谱系:静态与移动操作、导航,以及需要在整个室内环境中协调感知、规划与交互的多房间长程任务。我们还设计了包含8项任务的MolmoSpaces-Bench基准测试套件,机器人可在多样化场景中与标注完善的物体进行交互。实验表明:该基准测试具备强模拟到现实关联性(R=0.96,ho=0.98);验证了新式零样本策略在基准测试中优于早期版本;揭示了策略对指令措辞、关节初始位姿及摄像头遮挡的关键敏感性。通过MolmoSpaces及其开源资产与工具链,我们为机器人学习研究提供了可扩展数据生成、策略训练和基准创建的基础设施。
English
Deploying robots at scale demands robustness to the long tail of everyday situations. The countless variations in scene layout, object geometry, and task specifications that characterize real environments are vast and underrepresented in existing robot benchmarks. Measuring this level of generalization requires infrastructure at a scale and diversity that physical evaluation alone cannot provide. We introduce MolmoSpaces, a fully open ecosystem to support large-scale benchmarking of robot policies. MolmoSpaces consists of over 230k diverse indoor environments, ranging from handcrafted household scenes to procedurally generated multiroom houses, populated with 130k richly annotated object assets, including 48k manipulable objects with 42M stable grasps. Crucially, these environments are simulator-agnostic, supporting popular options such as MuJoCo, Isaac, and ManiSkill. The ecosystem supports the full spectrum of embodied tasks: static and mobile manipulation, navigation, and multiroom long-horizon tasks requiring coordinated perception, planning, and interaction across entire indoor environments. We also design MolmoSpaces-Bench, a benchmark suite of 8 tasks in which robots interact with our diverse scenes and richly annotated objects. Our experiments show MolmoSpaces-Bench exhibits strong sim-to-real correlation (R = 0.96, ho = 0.98), confirm newer and stronger zero-shot policies outperform earlier versions in our benchmarks, and identify key sensitivities to prompt phrasing, initial joint positions, and camera occlusion. Through MolmoSpaces and its open-source assets and tooling, we provide a foundation for scalable data generation, policy training, and benchmark creation for robot learning research.