ChatPaper.aiChatPaper

通過表面對齊高斯點陣生成可控文本到3D模型

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

March 15, 2024
作者: Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
cs.AI

摘要

儘管文本轉3D和圖像轉3D生成任務受到了相當大的關注,但在它們之間有一個重要但尚未被充分探索的領域,即可控文本轉3D生成,這是我們在這項工作中主要關注的。為了應對這個任務,1)我們引入了多視圖控制網絡(MVControl),這是一種新穎的神經網絡架構,旨在通過集成額外的輸入條件(如邊緣、深度、法線和塗抹地圖)來增強現有的預訓練多視圖擴散模型。我們的創新在於引入一個條件模塊,通過從輸入條件圖像和相機姿勢計算出的局部和全局嵌入來控制基礎擴散模型。一旦訓練完成,MVControl能夠為基於優化的3D生成提供3D擴散指導。2)我們提出了一種高效的多階段3D生成流程,利用最近大型重建模型和分數蒸餾算法的優勢。在我們的MVControl架構基礎上,我們採用了一種獨特的混合擴散指導方法來引導優化過程。為了追求效率,我們採用了3D高斯函數作為我們的表示,而不是常用的隱式表示。我們還開創了SuGaR的使用,這是一種將高斯函數綁定到網格三角形面的混合表示方法。這種方法緩解了3D高斯函數中幾何形狀不佳的問題,並實現了對網格上的細粒度幾何形狀的直接雕塑。廣泛的實驗表明,我們的方法實現了強大的泛化能力,並實現了高質量3D內容的可控生成。
English
While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.

Summary

AI-Generated Summary

PDF81December 15, 2024