Ouroboros3D:透過3D感知遞迴擴散生成3D圖像
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
June 5, 2024
作者: Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
cs.AI
摘要
現有的單張圖像轉3D創建方法通常涉及兩階段過程,首先生成多視圖圖像,然後使用這些圖像進行3D重建。然而,分開訓練這兩個階段會導致推理階段存在顯著的數據偏差,從而影響重建結果的質量。我們引入了一個統一的3D生成框架,名為Ouroboros3D,它將基於擴散的多視圖圖像生成和3D重建整合到一個遞歸擴散過程中。在我們的框架中,這兩個模塊通過自我條件機制聯合訓練,使它們能夠適應彼此的特徵以進行強大的推理。在多視圖去噪過程中,多視圖擴散模型使用由重建模塊在前一時間步渲染的3D感知地圖作為額外條件。具有3D感知反饋的遞歸擴散框架統一了整個過程並改善了幾何一致性。實驗表明,我們的框架優於將這兩個階段分開以及現有方法在推理階段結合它們的方法。項目頁面:https://costwen.github.io/Ouroboros3D/
English
Existing single image-to-3D creation methods typically involve a two-stage
process, first generating multi-view images, and then using these images for 3D
reconstruction. However, training these two stages separately leads to
significant data bias in the inference phase, thus affecting the quality of
reconstructed results. We introduce a unified 3D generation framework, named
Ouroboros3D, which integrates diffusion-based multi-view image generation and
3D reconstruction into a recursive diffusion process. In our framework, these
two modules are jointly trained through a self-conditioning mechanism, allowing
them to adapt to each other's characteristics for robust inference. During the
multi-view denoising process, the multi-view diffusion model uses the 3D-aware
maps rendered by the reconstruction module at the previous timestep as
additional conditions. The recursive diffusion framework with 3D-aware feedback
unites the entire process and improves geometric consistency.Experiments show
that our framework outperforms separation of these two stages and existing
methods that combine them at the inference phase. Project page:
https://costwen.github.io/Ouroboros3D/Summary
AI-Generated Summary