HoloDreamer:从文本生成全面的3D全景世界
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
July 21, 2024
作者: Haiyang Zhou, Xinhua Cheng, Wangbo Yu, Yonghong Tian, Li Yuan
cs.AI
摘要
在各个领域,包括虚拟现实、游戏和电影行业,3D场景生成需求很高。由于文本到图像扩散模型具有强大的生成能力,能提供可靠的先验知识,仅使用文本提示生成3D场景变得可行,从而显著推动了基于文本驱动的3D场景生成研究。为了从2D扩散模型中获得多视角监督,目前的方法通常利用扩散模型生成初始局部图像,然后通过迭代使用扩散模型对局部图像进行外延,逐渐生成场景。然而,这些基于外延的方法往往会产生全局不一致的场景生成结果,且完整度不高,限制了它们的广泛应用。为了解决这些问题,我们引入了HoloDreamer,这是一个框架,首先生成高清全景图作为完整3D场景的初始化,然后利用3D高斯飞溅(3D-GS)快速重建3D场景,从而促进了视角一致且完全封闭的3D场景的创建。具体来说,我们提出了风格化等距全景生成,这是一个流程,结合多个扩散模型,能够从复杂文本提示中实现风格化和详细的等距全景生成。随后,引入了增强型两阶段全景重建,通过对3D-GS进行两阶段优化,对缺失区域进行修补,增强场景的完整性。全面的实验表明,我们的方法在生成完全封闭场景时,在整体视觉一致性和和谐性、重建质量和渲染稳健性方面优于先前的工作。
English
3D scene generation is in high demand across various domains, including
virtual reality, gaming, and the film industry. Owing to the powerful
generative capabilities of text-to-image diffusion models that provide reliable
priors, the creation of 3D scenes using only text prompts has become viable,
thereby significantly advancing researches in text-driven 3D scene generation.
In order to obtain multiple-view supervision from 2D diffusion models,
prevailing methods typically employ the diffusion model to generate an initial
local image, followed by iteratively outpainting the local image using
diffusion models to gradually generate scenes. Nevertheless, these
outpainting-based approaches prone to produce global inconsistent scene
generation results without high degree of completeness, restricting their
broader applications. To tackle these problems, we introduce HoloDreamer, a
framework that first generates high-definition panorama as a holistic
initialization of the full 3D scene, then leverage 3D Gaussian Splatting
(3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation
of view-consistent and fully enclosed 3D scenes. Specifically, we propose
Stylized Equirectangular Panorama Generation, a pipeline that combines multiple
diffusion models to enable stylized and detailed equirectangular panorama
generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama
Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to
inpaint the missing region and enhance the integrity of the scene.
Comprehensive experiments demonstrated that our method outperforms prior works
in terms of overall visual consistency and harmony as well as reconstruction
quality and rendering robustness when generating fully enclosed scenes.