Map2World：基于分割地图条件的文本到三维世界生成

摘要

三维世界生成对于沉浸式内容创作和自动驾驶仿真等应用至关重要。尽管近期三维世界生成技术取得了显著进展，但现有方法仍受限于网格布局，且存在全局物体尺度不一致的问题。本文提出创新框架Map2World，首次实现基于用户自定义任意形状与尺度分割图的三维世界生成，确保大范围环境中全局尺度的统一性与布局灵活性。为进一步提升生成质量，我们设计了细节增强网络来生成世界的精细结构。该网络通过融入全局结构信息，在保持场景整体协调性的同时添加细粒度细节。整个流程充分利用资产生成器的强先验知识，即使在场景生成训练数据有限的情况下，也能实现跨领域的稳健泛化能力。大量实验表明，本方法在用户可控性、尺度一致性和内容连贯性方面显著优于现有方案，能够支持用户在更复杂条件下生成三维世界。

English

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.