Magic123:使用2D和3D扩散先验进行一图像到高质量3D物体生成
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
June 30, 2023
作者: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
cs.AI
摘要
我们提出了Magic123,这是一种用于从野外单个未摆姿态图像生成高质量带纹理3D网格的两阶段粗到精方法,利用了2D和3D先验知识。在第一阶段,我们优化神经辐射场以生成粗略几何结构。在第二阶段,我们采用了一种内存高效的可微网格表示,生成具有视觉吸引力纹理的高分辨率网格。在两个阶段中,通过参考视图监督和由2D和3D扩散先验知识组合引导的新视图,学习3D内容。我们引入了一个单一的权衡参数,用于控制生成几何形状的探索(更具想象力)和开发(更精确)之间的平衡。此外,我们采用文本反演和单目深度正则化,以鼓励视图之间的一致外观,并防止退化解。Magic123在合成基准和多样化的真实世界图像上进行了广泛实验证明,相较于先前的图像到3D技术,取得了显著改进。我们的代码、模型和生成的3D资产可在https://github.com/guochengqian/Magic123 获取。
English
We present Magic123, a two-stage coarse-to-fine approach for high-quality,
textured 3D meshes generation from a single unposed image in the wild using
both2D and 3D priors. In the first stage, we optimize a neural radiance field
to produce a coarse geometry. In the second stage, we adopt a memory-efficient
differentiable mesh representation to yield a high-resolution mesh with a
visually appealing texture. In both stages, the 3D content is learned through
reference view supervision and novel views guided by a combination of 2D and 3D
diffusion priors. We introduce a single trade-off parameter between the 2D and
3D priors to control exploration (more imaginative) and exploitation (more
precise) of the generated geometry. Additionally, we employ textual inversion
and monocular depth regularization to encourage consistent appearances across
views and to prevent degenerate solutions, respectively. Magic123 demonstrates
a significant improvement over previous image-to-3D techniques, as validated
through extensive experiments on synthetic benchmarks and diverse real-world
images. Our code, models, and generated 3D assets are available at
https://github.com/guochengqian/Magic123.