ComboVerse: 공간 인식 확산 가이던스를 활용한 구성적 3D 자산 생성

초록

주어진 이미지로부터 고품질의 3D 자산을 생성하는 것은 AR/VR과 같은 다양한 응용 분야에서 매우 바람직한 기술입니다. 최근 단일 이미지 3D 생성 분야의 발전은 최적화 없이도 객체의 3D 모델을 추론하는 피드포워드 모델을 탐구하고 있습니다. 단일 객체 생성에서는 유망한 결과를 달성했지만, 이러한 방법들은 본질적으로 여러 객체를 포함하는 복잡한 3D 자산을 모델링하는 데 어려움을 겪는 경우가 많습니다. 본 연구에서는 여러 모델을 결합하는 방법을 학습하여 복잡한 구성을 가진 고품질 3D 자산을 생성하는 ComboVerse라는 3D 생성 프레임워크를 제안합니다. 1) 먼저, 모델과 데이터 관점에서 이 "다중 객체 간극"에 대한 심층 분석을 수행합니다. 2) 다음으로, 다양한 객체의 재구성된 3D 모델을 기반으로 크기, 회전 각도, 위치를 조정하여 주어진 이미지와 일치하는 3D 자산을 생성합니다. 3) 이 과정을 자동화하기 위해, 사전 학습된 확산 모델로부터 공간 인식 점수 증류 샘플링(SSDS)을 적용하여 객체의 위치를 안내합니다. 제안된 프레임워크는 표준 점수 증류 샘플링과 비교하여 객체의 공간 정렬을 강조함으로써 더 정확한 결과를 달성합니다. 광범위한 실험을 통해 ComboVerse가 기존 방법 대비 구성적 3D 자산 생성에서 명확한 개선을 이루었음을 검증합니다.

English

Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.

ComboVerse: 공간 인식 확산 가이던스를 활용한 구성적 3D 자산 생성

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

초록

Support