ChatPaper.aiChatPaper

ComboVerse:使用具有空間感知的擴散引導進行組合式3D資產創建

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

March 19, 2024
作者: Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu
cs.AI

摘要

從給定的圖像生成高質量的3D資產在各種應用中非常理想,如AR/VR。最近在單圖像3D生成方面的進展探索了前饋模型,這些模型學習推斷對象的3D模型而無需進行優化。儘管在單個對象生成方面取得了令人鼓舞的結果,但這些方法通常難以建模包含多個對象的複雜3D資產。在這項工作中,我們提出了ComboVerse,一個3D生成框架,通過學習結合多個模型來生成具有複雜組成的高質量3D資產。1) 我們首先從模型和數據角度對這種“多對象差距”進行深入分析。2) 接下來,通過重建不同對象的3D模型,我們試圖調整它們的大小、旋轉角度和位置,以創建與給定圖像匹配的3D資產。3) 為了自動化這個過程,我們應用了從預訓練擴散模型中空間感知的分數蒸餾取樣(SSDS)來引導對象的定位。我們提出的框架強調對象的空間對齊,相較於標準分數蒸餾取樣,因此實現了更準確的結果。大量實驗驗證了ComboVerse在生成組合式3D資產方面明顯優於現有方法。
English
Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.

Summary

AI-Generated Summary

PDF102December 15, 2024