MolmoSpaces：面向机器人导航与操作的大规模开放生态系统

摘要

大规模部署机器人需要应对日常场景中的长尾问题。现实环境中场景布局、物体几何形状和任务规范的无限变化极为复杂，而现有机器人基准测试对此类情况覆盖不足。衡量这种程度的泛化能力需要具备规模和多样性的基础设施，仅靠实体评估难以实现。我们推出MolmoSpaces——一个完全开放的生态系统，用于支持机器人策略的大规模基准测试。该生态系统包含23万多个多样化室内环境，涵盖手工打造的家庭场景到程序化生成的多房间住宅，配置13万个带丰富标注的物体资产，其中包含4.8万个可操作物体及4200万个稳定抓取位。关键的是，这些环境支持主流模拟器（如MuJoCo、Isaac和ManiSkill）的跨平台使用。该系统支持全系列具身智能任务：静态与移动操作、导航，以及需要在整个室内环境中协调感知、规划与交互的多房间长周期任务。我们还设计了包含8项任务的基准测试套件MolmoSpaces-Bench，让机器人与多样化场景及带丰富标注的物体进行交互。实验表明：该基准测试呈现强仿真-现实关联性（R=0.96，ho=0.98）；验证了新式零样本策略在基准测试中优于早期版本；揭示了策略对提示语表述、初始关节位姿及摄像头遮挡的关键敏感性。通过MolmoSpaces及其开源资产与工具，我们为机器人学习研究提供了可扩展数据生成、策略训练和基准创建的基础平台。

English

Deploying robots at scale demands robustness to the long tail of everyday situations. The countless variations in scene layout, object geometry, and task specifications that characterize real environments are vast and underrepresented in existing robot benchmarks. Measuring this level of generalization requires infrastructure at a scale and diversity that physical evaluation alone cannot provide. We introduce MolmoSpaces, a fully open ecosystem to support large-scale benchmarking of robot policies. MolmoSpaces consists of over 230k diverse indoor environments, ranging from handcrafted household scenes to procedurally generated multiroom houses, populated with 130k richly annotated object assets, including 48k manipulable objects with 42M stable grasps. Crucially, these environments are simulator-agnostic, supporting popular options such as MuJoCo, Isaac, and ManiSkill. The ecosystem supports the full spectrum of embodied tasks: static and mobile manipulation, navigation, and multiroom long-horizon tasks requiring coordinated perception, planning, and interaction across entire indoor environments. We also design MolmoSpaces-Bench, a benchmark suite of 8 tasks in which robots interact with our diverse scenes and richly annotated objects. Our experiments show MolmoSpaces-Bench exhibits strong sim-to-real correlation (R = 0.96, ho = 0.98), confirm newer and stronger zero-shot policies outperform earlier versions in our benchmarks, and identify key sensitivities to prompt phrasing, initial joint positions, and camera occlusion. Through MolmoSpaces and its open-source assets and tooling, we provide a foundation for scalable data generation, policy training, and benchmark creation for robot learning research.

MolmoSpaces：面向机器人导航与操作的大规模开放生态系统

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

摘要

Support