Dual3D:使用双模多视图潜扩散实现高效一致的文本到3D生成
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
May 16, 2024
作者: Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji
cs.AI
摘要
我们提出了Dual3D,这是一个新颖的文本到3D生成框架,可以在仅1分钟内从文本中生成高质量的3D资产。关键组件是双模式多视角潜扩散模型。给定嘈杂的多视角潜变量,2D模式可以通过单个潜变量去噪网络有效地对其进行去噪,而3D模式可以生成三平面神经表面以进行一致的基于渲染的去噪。两种模式的大多数模块都是从预训练的文本到图像潜扩散模型微调而来,以避免从头开始训练的昂贵成本。为了克服推断过程中的高渲染成本,我们提出了双模式切换推断策略,只需使用3D模式的1/10去噪步骤,就可以成功在仅10秒内生成一个3D资产,而不会牺牲质量。3D资产的纹理可以通过我们高效的纹理细化过程在短时间内进一步增强。大量实验证明,我们的方法在显著减少生成时间的同时提供了最先进的性能。我们的项目页面位于https://dual3d.github.io。
English
We present Dual3D, a novel text-to-3D generation framework that generates
high-quality 3D assets from texts in only 1 minute.The key component is a
dual-mode multi-view latent diffusion model. Given the noisy multi-view
latents, the 2D mode can efficiently denoise them with a single latent
denoising network, while the 3D mode can generate a tri-plane neural surface
for consistent rendering-based denoising. Most modules for both modes are tuned
from a pre-trained text-to-image latent diffusion model to circumvent the
expensive cost of training from scratch. To overcome the high rendering cost
during inference, we propose the dual-mode toggling inference strategy to use
only 1/10 denoising steps with 3D mode, successfully generating a 3D asset in
just 10 seconds without sacrificing quality. The texture of the 3D asset can
be further enhanced by our efficient texture refinement process in a short
time. Extensive experiments demonstrate that our method delivers
state-of-the-art performance while significantly reducing generation time. Our
project page is available at https://dual3d.github.ioSummary
AI-Generated Summary