Ouroboros3D:通过3D感知递归扩散进行图像到3D生成。
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
June 5, 2024
作者: Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
cs.AI
摘要
现有的单图像到3D创建方法通常包括两个阶段的过程,首先生成多视图图像,然后使用这些图像进行3D重建。然而,分别训练这两个阶段会导致推断阶段存在显著的数据偏差,从而影响重建结果的质量。我们引入了一个统一的3D生成框架,命名为Ouroboros3D,它将基于扩散的多视图图像生成和3D重建集成到一个递归扩散过程中。在我们的框架中,这两个模块通过自我调节机制联合训练,使它们能够适应彼此的特征以进行稳健的推断。在多视图去噪过程中,多视图扩散模型使用由先前时间步的重建模块渲染的3D感知地图作为额外条件。具有3D感知反馈的递归扩散框架统一了整个过程并提高了几何一致性。实验证明,我们的框架优于将这两个阶段分开以及将它们结合在推断阶段的现有方法。项目页面:https://costwen.github.io/Ouroboros3D/
English
Existing single image-to-3D creation methods typically involve a two-stage
process, first generating multi-view images, and then using these images for 3D
reconstruction. However, training these two stages separately leads to
significant data bias in the inference phase, thus affecting the quality of
reconstructed results. We introduce a unified 3D generation framework, named
Ouroboros3D, which integrates diffusion-based multi-view image generation and
3D reconstruction into a recursive diffusion process. In our framework, these
two modules are jointly trained through a self-conditioning mechanism, allowing
them to adapt to each other's characteristics for robust inference. During the
multi-view denoising process, the multi-view diffusion model uses the 3D-aware
maps rendered by the reconstruction module at the previous timestep as
additional conditions. The recursive diffusion framework with 3D-aware feedback
unites the entire process and improves geometric consistency.Experiments show
that our framework outperforms separation of these two stages and existing
methods that combine them at the inference phase. Project page:
https://costwen.github.io/Ouroboros3D/Summary
AI-Generated Summary