使用双向扩散结合2D和3D先验进行文本到3D生成
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
December 7, 2023
作者: Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue
cs.AI
摘要
大多数3D生成研究侧重于将2D基础模型向上投影到3D空间,要么通过最小化2D得分蒸馏采样(SDS)损失,要么在多视角数据集上进行微调。在没有明确的3D先验知识的情况下,这些方法通常会导致几何异常和多视角不一致性。最近,研究人员尝试通过直接在3D数据集上进行训练来提高3D物体的真实性,尽管由于3D数据集中纹理多样性有限,这会导致纹理生成质量较低。为了充分利用这两种方法的优势,我们提出了双向扩散(BiDiff),这是一个统一的框架,结合了3D和2D扩散过程,分别保留了3D的保真度和2D的纹理丰富性。此外,由于简单的组合可能会产生不一致的生成结果,我们进一步通过新颖的双向引导来连接它们。此外,我们的方法可以用作基于优化的模型的初始化,进一步提高3D模型的质量和优化效率,将生成过程从3.4小时减少到20分钟。实验结果表明,我们的模型实现了高质量、多样化和可扩展的3D生成。项目网站:https://bidiff.github.io/。
English
Most 3D generation research focuses on up-projecting 2D foundation models
into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS)
loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these
methods often lead to geometric anomalies and multi-view inconsistency.
Recently, researchers have attempted to improve the genuineness of 3D objects
by directly training on 3D datasets, albeit at the cost of low-quality texture
generation due to the limited texture diversity in 3D datasets. To harness the
advantages of both approaches, we propose Bidirectional Diffusion(BiDiff), a
unified framework that incorporates both a 3D and a 2D diffusion process, to
preserve both 3D fidelity and 2D texture richness, respectively. Moreover, as a
simple combination may yield inconsistent generation results, we further bridge
them with novel bidirectional guidance. In addition, our method can be used as
an initialization of optimization-based models to further improve the quality
of 3D model and efficiency of optimization, reducing the generation process
from 3.4 hours to 20 minutes. Experimental results have shown that our model
achieves high-quality, diverse, and scalable 3D generation. Project website:
https://bidiff.github.io/.