通过表面对齐的高斯点喷洒实现可控文本到3D生成

摘要

尽管文本到3D和图像到3D生成任务受到了相当多的关注，但它们之间一个重要但未被充分探索的领域是可控文本到3D生成，这是我们在这项工作中主要关注的。为了解决这个任务，1) 我们引入了多视角控制网络（MVControl），这是一种新颖的神经网络架构，旨在通过整合额外的输入条件（如边缘、深度、法线和涂鸦地图）来增强现有的预训练多视角扩散模型。我们的创新在于引入了一个调节模块，利用从输入条件图像和摄像机姿态计算得出的局部和全局嵌入，来控制基础扩散模型。一旦训练完成，MVControl能够为基于优化的3D生成提供3D扩散指导。2) 我们提出了一种高效的多阶段3D生成流程，利用最近大型重建模型和得分蒸馏算法的优势。在我们的MVControl架构基础上，我们采用了一种独特的混合扩散指导方法来引导优化过程。为了追求效率，我们采用3D高斯函数作为我们的表示，而不是常用的隐式表示。我们还首创了SuGaR的使用，这是一种将高斯函数绑定到网格三角形面的混合表示。这种方法缓解了3D高斯函数中几何形状不佳的问题，并实现了对网格上细粒度几何形状的直接雕塑。大量实验证明，我们的方法实现了稳健的泛化，并实现了高质量3D内容的可控生成。

English

While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.

通过表面对齐的高斯点喷洒实现可控文本到3D生成

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

摘要

Support