IM-3D:用于高质量3D生成的迭代多视角扩散和重建
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
February 13, 2024
作者: Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
cs.AI
摘要
大多数文本到3D生成器是基于已经训练过的数十亿图像的文本到图像模型构建的。它们使用Score Distillation Sampling(SDS)的变体,这种方法速度较慢,有些不稳定,并且容易产生伪影。一种缓解方法是对2D生成器进行微调,使其具备多视角感知能力,这有助于蒸馏,或者可以与重建网络结合,直接输出3D对象。在本文中,我们进一步探讨了文本到3D模型的设计空间。通过考虑视频而不是图像生成器,我们显著改进了多视角生成。结合使用高斯喷洒的3D重建算法,可以优化稳健的基于图像的损失,我们可以直接从生成的视图中产生高质量的3D输出。我们的新方法,IM-3D,将2D生成器网络的评估次数减少了10-100倍,从而实现了更高效的流程、更好的质量、更少的几何不一致性以及更高的可用3D资产产出。
English
Most text-to-3D generators build upon off-the-shelf text-to-image models
trained on billions of images. They use variants of Score Distillation Sampling
(SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation
is to fine-tune the 2D generator to be multi-view aware, which can help
distillation or can be combined with reconstruction networks to output 3D
objects directly. In this paper, we further explore the design space of
text-to-3D models. We significantly improve multi-view generation by
considering video instead of image generators. Combined with a 3D
reconstruction algorithm which, by using Gaussian splatting, can optimize a
robust image-based loss, we directly produce high-quality 3D outputs from the
generated views. Our new method, IM-3D, reduces the number of evaluations of
the 2D generator network 10-100x, resulting in a much more efficient pipeline,
better quality, fewer geometric inconsistencies, and higher yield of usable 3D
assets.Summary
AI-Generated Summary