WorldGrow：生成無限3D世界

摘要

我們致力於解決可無限擴展的三維世界生成難題——即創建具有連貫幾何結構與逼真外觀的大規模連續環境。現有方法面臨關鍵挑戰：二維升維技術存在多視角間的幾何與外觀不一致問題，三維隱式表示難以擴展規模，而當前三維基礎模型大多以物體為中心，限制了其在場景級生成中的應用。我們的核心洞見在於利用預訓練三維模型的強生成先驗來實現結構化場景塊生成。為此，我們提出WorldGrow——一個用於無邊界三維場景合成的分層框架。該方法具備三大核心組件：（1）數據篩選流程，可提取高質量場景塊用於訓練，使三維結構化潛在表徵適用於場景生成；（2）三維場景塊修補機制，實現上下文感知的場景擴展；（3）由粗到精的生成策略，確保全局佈局合理性與局部幾何/紋理保真度。在大規模3D-FRONT數據集上的評估表明，WorldGrow在幾何重建方面達到頂尖性能，同時獨特支持具有照片級真實感與結構一致性的無限場景生成。這些成果凸顯了其構建大規模虛擬環境的能力，以及為未來世界模型建設提供的潛力。

English

We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.

WorldGrow：生成無限3D世界

WorldGrow: Generating Infinite 3D World

摘要

Support