使用雙向擴散技術結合2D和3D先驗進行文本到3D生成

摘要

大多數3D生成研究專注於將2D基礎模型向上投影到3D空間，方法是通過最小化2D分數蒸餾採樣（SDS）損失或在多視圖數據集上進行微調。在沒有明確的3D先驗知識的情況下，這些方法通常會導致幾何異常和多視圖不一致。最近，研究人員試圖通過直接在3D數據集上進行訓練來提高3D物體的真實性，盡管這會導致由於3D數據集中的紋理多樣性有限而產生低質量的紋理生成。為了充分利用這兩種方法的優勢，我們提出了雙向擴散（BiDiff），這是一個統一的框架，融合了3D和2D擴散過程，分別保留了3D的忠實度和2D的紋理豐富性。此外，由於簡單的組合可能會導致不一致的生成結果，我們進一步通過新穎的雙向引導來搭建它們之間的橋樑。此外，我們的方法可以用作基於優化的模型的初始化，以進一步提高3D模型的質量和優化的效率，將生成過程從3.4小時縮短到20分鐘。實驗結果表明，我們的模型實現了高質量、多樣化且可擴展的3D生成。項目網站：https://bidiff.github.io/。

English

Most 3D generation research focuses on up-projecting 2D foundation models into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS) loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets. To harness the advantages of both approaches, we propose Bidirectional Diffusion(BiDiff), a unified framework that incorporates both a 3D and a 2D diffusion process, to preserve both 3D fidelity and 2D texture richness, respectively. Moreover, as a simple combination may yield inconsistent generation results, we further bridge them with novel bidirectional guidance. In addition, our method can be used as an initialization of optimization-based models to further improve the quality of 3D model and efficiency of optimization, reducing the generation process from 3.4 hours to 20 minutes. Experimental results have shown that our model achieves high-quality, diverse, and scalable 3D generation. Project website: https://bidiff.github.io/.

使用雙向擴散技術結合2D和3D先驗進行文本到3D生成

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

摘要

Support