Magic123:使用2D和3D擴散先驗生成高質量3D物體的單圖像
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
June 30, 2023
作者: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
cs.AI
摘要
我們提出了Magic123,一種從單一未經姿勢處理的野外圖像生成高質量、帶紋理的3D網格的兩階段粗到細方法,利用2D和3D先驗。在第一階段,我們優化神經輻射場以生成粗略幾何。在第二階段,我們採用內存高效的可微網格表示,產生具有視覺吸引力紋理的高分辨率網格。在兩個階段中,通過參考視圖監督和由2D和3D擴散先驗組合引導的新視圖,學習3D內容。我們引入了一個單一的權衡參數,用於控制生成幾何的探索(更具想像力)和開發(更精確)之間的平衡。此外,我們應用文本反演和單眼深度正則化,鼓勵跨視圖保持一致外觀,並防止退化解。Magic123在合成基準測試和多樣的現實世界圖像上進行了廣泛實驗,顯示明顯優於先前的圖像到3D技術。我們的代碼、模型和生成的3D資產可在https://github.com/guochengqian/Magic123 上找到。
English
We present Magic123, a two-stage coarse-to-fine approach for high-quality,
textured 3D meshes generation from a single unposed image in the wild using
both2D and 3D priors. In the first stage, we optimize a neural radiance field
to produce a coarse geometry. In the second stage, we adopt a memory-efficient
differentiable mesh representation to yield a high-resolution mesh with a
visually appealing texture. In both stages, the 3D content is learned through
reference view supervision and novel views guided by a combination of 2D and 3D
diffusion priors. We introduce a single trade-off parameter between the 2D and
3D priors to control exploration (more imaginative) and exploitation (more
precise) of the generated geometry. Additionally, we employ textual inversion
and monocular depth regularization to encourage consistent appearances across
views and to prevent degenerate solutions, respectively. Magic123 demonstrates
a significant improvement over previous image-to-3D techniques, as validated
through extensive experiments on synthetic benchmarks and diverse real-world
images. Our code, models, and generated 3D assets are available at
https://github.com/guochengqian/Magic123.