ComboVerse:使用空间感知扩散引导进行组合式三维资产创建
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
March 19, 2024
作者: Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu
cs.AI
摘要
从给定图像中生成高质量的3D资产在诸如AR/VR等各种应用中非常理想。最近在单图像3D生成方面取得的进展探索了学习推断对象的3D模型而无需优化的前馈模型。尽管在单个对象生成方面取得了令人鼓舞的成果,但这些方法通常难以对固有包含多个对象的复杂3D资产进行建模。在这项工作中,我们提出了ComboVerse,这是一个3D生成框架,通过学习组合多个模型来生成具有复杂构成的高质量3D资产。1) 我们首先从模型和数据两个角度对这种“多对象差距”进行了深入分析。2) 接下来,通过重建不同对象的3D模型,我们试图调整它们的大小、旋转角度和位置,以创建与给定图像匹配的3D资产。3) 为了自动化这个过程,我们应用了来自预训练扩散模型的空间感知得分蒸馏采样(SSDS)来指导对象的定位。与标准得分蒸馏采样相比,我们提出的框架强调对象的空间对齐,从而实现更准确的结果。大量实验证实,ComboVerse在生成构成性3D资产方面明显优于现有方法。
English
Generating high-quality 3D assets from a given image is highly desirable in
various applications such as AR/VR. Recent advances in single-image 3D
generation explore feed-forward models that learn to infer the 3D model of an
object without optimization. Though promising results have been achieved in
single object generation, these methods often struggle to model complex 3D
assets that inherently contain multiple objects. In this work, we present
ComboVerse, a 3D generation framework that produces high-quality 3D assets with
complex compositions by learning to combine multiple models. 1) We first
perform an in-depth analysis of this ``multi-object gap'' from both model and
data perspectives. 2) Next, with reconstructed 3D models of different objects,
we seek to adjust their sizes, rotation angles, and locations to create a 3D
asset that matches the given image. 3) To automate this process, we apply
spatially-aware score distillation sampling (SSDS) from pretrained diffusion
models to guide the positioning of objects. Our proposed framework emphasizes
spatial alignment of objects, compared with standard score distillation
sampling, and thus achieves more accurate results. Extensive experiments
validate ComboVerse achieves clear improvements over existing methods in
generating compositional 3D assets.Summary
AI-Generated Summary