Map2World：基於分割地圖條件的文本到三維世界生成

摘要

三維世界生成對於沉浸式內容創作或自動駕駛模擬等應用至關重要。儘管近期三維世界生成技術已展現出令人鼓舞的成果，現有方法仍受網格佈局限制，且存在全域物件尺度不一致的問題。本研究提出創新框架Map2World，首次實現基於用戶自定義任意形狀與尺度的分割圖來生成三維世界，確保大規模環境下的全域尺度一致性與佈局靈活性。為進一步提升品質，我們設計細節增強網絡來生成世界的精細結構。該網絡通過融入全域結構信息，能在不破壞場景整體連貫性的前提下添加細粒度細節。整個流程充分利用資產生成器的強先驗知識，即使面對有限場景生成訓練數據，仍能實現跨領域的穩健泛化能力。大量實驗表明，本方法在用戶可控性、尺度一致性和內容連貫性方面顯著優於現有技術，使用戶能在更複雜條件下生成三維世界。

English

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

Map2World：基於分割地圖條件的文本到三維世界生成

Map2World: Segment Map Conditioned Text to 3D World Generation

摘要

Support