ShowRoom3D：使用3D先验生成高质量3D房间的文本到3D房间生成

摘要

我们介绍了ShowRoom3D，这是一种从文本生成高质量3D房间场景的三阶段方法。先前使用2D扩散先验来优化神经辐射场以生成房间规模场景的方法显示出了不尽人意的质量。这主要归因于2D先验缺乏3D意识和在训练方法上的约束。在本文中，我们利用了一种3D扩散先验，MVDiffusion，来优化3D房间规模场景。我们的贡献有两个方面。首先，我们提出了一个渐进式视图选择过程来优化NeRF。这涉及将训练过程分为三个阶段，逐渐扩大摄像机采样范围。其次，我们在第二阶段提出了姿态转换方法。它将确保MVDiffusion提供准确的视图指导。因此，ShowRoom3D使得生成的房间具有改善的结构完整性，从任何视角都有增强的清晰度，减少内容重复，并且在不同视角之间具有更高的一致性。大量实验证明，我们的方法在用户研究方面明显优于最先进的方法。

English

We introduce ShowRoom3D, a three-stage approach for generating high-quality 3D room-scale scenes from texts. Previous methods using 2D diffusion priors to optimize neural radiance fields for generating room-scale scenes have shown unsatisfactory quality. This is primarily attributed to the limitations of 2D priors lacking 3D awareness and constraints in the training methodology. In this paper, we utilize a 3D diffusion prior, MVDiffusion, to optimize the 3D room-scale scene. Our contributions are in two aspects. Firstly, we propose a progressive view selection process to optimize NeRF. This involves dividing the training process into three stages, gradually expanding the camera sampling scope. Secondly, we propose the pose transformation method in the second stage. It will ensure MVDiffusion provide the accurate view guidance. As a result, ShowRoom3D enables the generation of rooms with improved structural integrity, enhanced clarity from any view, reduced content repetition, and higher consistency across different perspectives. Extensive experiments demonstrate that our method, significantly outperforms state-of-the-art approaches by a large margin in terms of user study.

ShowRoom3D：使用3D先验生成高质量3D房间的文本到3D房间生成

ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors

摘要

Support