使用受控多视角编辑的通用3D扩散适配器
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
March 18, 2024
作者: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas
cs.AI
摘要
由于数据有限和计算复杂性较高,开放领域的3D物体合成落后于图像合成。为了弥补这一差距,最近的研究作品探讨了多视角扩散,但往往在3D一致性、视觉质量或效率方面存在不足。本文提出了MVEdit,作为SDEdit的3D对应物,采用祖先采样来联合去噪多视角图像并输出高质量纹理网格。基于现成的2D扩散模型,MVEdit通过无需训练的3D适配器实现了3D一致性,该适配器将最后一个时间步的2D视图提升为连贯的3D表示,然后使用渲染视图来调整下一个时间步的2D视图,同时不影响视觉质量。在仅需2-5分钟的推断时间内,该框架在质量和速度之间实现了比分数蒸馏更好的权衡。MVEdit非常灵活和可扩展,具有广泛的应用,包括文本/图像到3D生成、3D到3D编辑和高质量纹理合成。特别是,评估表明在图像到3D和文本引导纹理生成任务中表现出最先进的性能。此外,我们介绍了一种方法,可以在资源有限的情况下对小型3D数据集上的2D潜在扩散模型进行微调,从而实现快速低分辨率文本到3D的初始化。
English
Open-domain 3D object synthesis has been lagging behind image synthesis due
to limited data and higher computational complexity. To bridge this gap, recent
works have investigated multi-view diffusion but often fall short in either 3D
consistency, visual quality, or efficiency. This paper proposes MVEdit, which
functions as a 3D counterpart of SDEdit, employing ancestral sampling to
jointly denoise multi-view images and output high-quality textured meshes.
Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency
through a training-free 3D Adapter, which lifts the 2D views of the last
timestep into a coherent 3D representation, then conditions the 2D views of the
next timestep using rendered views, without uncompromising visual quality. With
an inference time of only 2-5 minutes, this framework achieves better trade-off
between quality and speed than score distillation. MVEdit is highly versatile
and extendable, with a wide range of applications including text/image-to-3D
generation, 3D-to-3D editing, and high-quality texture synthesis. In
particular, evaluations demonstrate state-of-the-art performance in both
image-to-3D and text-guided texture generation tasks. Additionally, we
introduce a method for fine-tuning 2D latent diffusion models on small 3D
datasets with limited resources, enabling fast low-resolution text-to-3D
initialization.Summary
AI-Generated Summary