Dual3D:具有雙模式多視角潛在擴散的高效且一致的文本生成三維模型
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
May 16, 2024
作者: Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji
cs.AI
摘要
我們提出了 Dual3D,一個新穎的文本轉3D生成框架,可以在僅1分鐘內從文本生成高質量的3D資產。其關鍵組件是雙模式多視角潛移漸變模型。給定噪聲多視角潛移漸變,2D模式可以通過單一潛移漸變去噪網絡有效地去噪,而3D模式可以生成三平面神經表面以實現一致的基於渲染的去噪。兩種模式的大多數模塊都是從預訓練的文本到圖像潛移漸變模型微調而來,以避免從頭開始訓練的昂貴成本。為了克服推斷過程中的高渲染成本,我們提出了雙模式切換推斷策略,僅使用3D模式的1/10去噪步驟,在不降低質量的情況下僅需10秒就能成功生成3D資產。3D資產的紋理可以通過我們高效的紋理精細化過程進一步增強,並在短時間內完成。大量實驗表明,我們的方法提供了最先進的性能,同時顯著縮短了生成時間。我們的項目頁面位於 https://dual3d.github.io。
English
We present Dual3D, a novel text-to-3D generation framework that generates
high-quality 3D assets from texts in only 1 minute.The key component is a
dual-mode multi-view latent diffusion model. Given the noisy multi-view
latents, the 2D mode can efficiently denoise them with a single latent
denoising network, while the 3D mode can generate a tri-plane neural surface
for consistent rendering-based denoising. Most modules for both modes are tuned
from a pre-trained text-to-image latent diffusion model to circumvent the
expensive cost of training from scratch. To overcome the high rendering cost
during inference, we propose the dual-mode toggling inference strategy to use
only 1/10 denoising steps with 3D mode, successfully generating a 3D asset in
just 10 seconds without sacrificing quality. The texture of the 3D asset can
be further enhanced by our efficient texture refinement process in a short
time. Extensive experiments demonstrate that our method delivers
state-of-the-art performance while significantly reducing generation time. Our
project page is available at https://dual3d.github.ioSummary
AI-Generated Summary