ChatPaper.aiChatPaper

World2Minecraft:基于占位驱动的虚拟场景构建

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

April 30, 2026
作者: Lechao Zhang, Haoran Xu, Jingyu Gong, Xuhong Wang, Yuan Xie, Xin Tan
cs.AI

摘要

具身智能的实现需要高精度仿真环境来支撑感知与决策,但现有平台常受数据污染与灵活性不足的制约。为此,我们提出World2Minecraft框架,基于3D语义占据预测将真实场景转化为结构化《我的世界》环境。在重构场景中,我们可无缝执行视觉语言导航等下游任务。然而发现重建质量高度依赖精准的占据预测,而现有模型受限于数据匮乏与泛化能力不足。我们引入了一种低成本、自动化、可扩展的数据采集流程用于构建定制化占据数据集,并通过MinecraftOcc数据集验证其有效性——该大规模数据集包含来自156个高细节室内场景的100,165张图像。大量实验表明,我们的数据集为现有数据资源提供了关键补充,并对当前SOTA方法构成显著挑战。这些研究成果不仅推动占据预测技术进步,更凸显World2Minecraft在为个性化具身AI研究提供可定制、可编辑平台方面的重要价值。项目页面:https://world2minecraft.github.io/。
English
Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To mitigate this, we propose World2Minecraft to convert real-world scenes into structured Minecraft environments based on 3D semantic occupancy prediction. In the reconstructed scenes, we can effortlessly perform downstream tasks such as Vision-Language Navigation(VLN). However, we observe that reconstruction quality heavily depends on accurate occupancy prediction, which remains limited by data scarcity and poor generalization in existing models. We introduce a low-cost, automated, and scalable data acquisition pipeline for creating customized occupancy datasets, and demonstrate its effectiveness through MinecraftOcc, a large-scale dataset featuring 100,165 images from 156 richly detailed indoor scenes. Extensive experiments show that our dataset provides a critical complement to existing datasets and poses a significant challenge to current SOTA methods. These findings contribute to improving occupancy prediction and highlight the value of World2Minecraft in providing a customizable and editable platform for personalized embodied AI research. Project page:https://world2minecraft.github.io/.
PDF21May 2, 2026