HoloDreamer:從文本生成全面的3D全景世界
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
July 21, 2024
作者: Haiyang Zhou, Xinhua Cheng, Wangbo Yu, Yonghong Tian, Li Yuan
cs.AI
摘要
在各個領域中,包括虛擬實境、遊戲和電影工業,3D場景生成需求日益增加。由於文本到圖像擴散模型具有強大的生成能力,提供可靠的先驗知識,僅使用文本提示便可以創建3D場景已經成為可能,從而顯著推動了以文本驅動的3D場景生成研究。為了從2D擴散模型獲得多視圖監督,目前的方法通常利用擴散模型生成初始局部圖像,然後通過逐步使用擴散模型對局部圖像進行外部繪製,逐漸生成場景。然而,這些基於外部繪製的方法容易產生全局不一致的場景生成結果,並且缺乏高度完整性,限制了它們的廣泛應用。為了應對這些問題,我們引入了HoloDreamer,這是一個框架,首先生成高清全景作為完整3D場景的初始值,然後利用3D高斯飛灑(3D-GS)快速重建3D場景,從而促進創建視圖一致且完全封閉的3D場景。具體而言,我們提出了風格化等距圓柱全景生成,這是一個流程,結合多個擴散模型,從複雜的文本提示中實現風格化和詳細的等距圓柱全景生成。隨後,引入了增強型兩階段全景重建,對3D-GS進行兩階段優化,對缺失區域進行修補,增強場景的完整性。全面的實驗表明,我們的方法在生成完全封閉場景時,在整體視覺一致性和和諧性、重建質量和渲染韌性方面優於先前的作品。
English
3D scene generation is in high demand across various domains, including
virtual reality, gaming, and the film industry. Owing to the powerful
generative capabilities of text-to-image diffusion models that provide reliable
priors, the creation of 3D scenes using only text prompts has become viable,
thereby significantly advancing researches in text-driven 3D scene generation.
In order to obtain multiple-view supervision from 2D diffusion models,
prevailing methods typically employ the diffusion model to generate an initial
local image, followed by iteratively outpainting the local image using
diffusion models to gradually generate scenes. Nevertheless, these
outpainting-based approaches prone to produce global inconsistent scene
generation results without high degree of completeness, restricting their
broader applications. To tackle these problems, we introduce HoloDreamer, a
framework that first generates high-definition panorama as a holistic
initialization of the full 3D scene, then leverage 3D Gaussian Splatting
(3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation
of view-consistent and fully enclosed 3D scenes. Specifically, we propose
Stylized Equirectangular Panorama Generation, a pipeline that combines multiple
diffusion models to enable stylized and detailed equirectangular panorama
generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama
Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to
inpaint the missing region and enhance the integrity of the scene.
Comprehensive experiments demonstrated that our method outperforms prior works
in terms of overall visual consistency and harmony as well as reconstruction
quality and rendering robustness when generating fully enclosed scenes.Summary
AI-Generated Summary