IM-3D: 고품질 3D 생성을 위한 반복적 다중 뷰 확산 및 재구성

초록

대부분의 텍스트-3D 생성기는 수십억 장의 이미지로 훈련된 기성 텍스트-이미지 모델을 기반으로 구축됩니다. 이들은 Score Distillation Sampling(SDS)의 변형을 사용하는데, 이는 속도가 느리고 다소 불안정하며 아티팩트가 발생하기 쉽습니다. 이를 완화하기 위해 2D 생성기를 다중 뷰 인식으로 미세 조정하여 증류를 돕거나 재구성 네트워크와 결합하여 직접 3D 객체를 출력할 수 있습니다. 본 논문에서는 텍스트-3D 모델의 설계 공간을 더욱 탐구합니다. 이미지 생성기 대신 비디오 생성기를 고려함으로써 다중 뷰 생성을 크게 개선했습니다. 가우시안 스플래팅을 사용하여 강력한 이미지 기반 손실을 최적화할 수 있는 3D 재구성 알고리즘과 결합하여 생성된 뷰에서 직접 고품질의 3D 출력을 생성합니다. 우리의 새로운 방법인 IM-3D는 2D 생성기 네트워크의 평가 횟수를 10-100배 줄여 훨씬 더 효율적인 파이프라인, 더 나은 품질, 더 적은 기하학적 불일치, 그리고 더 높은 사용 가능한 3D 자산 수율을 달성합니다.

English

Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets.

IM-3D: 고품질 3D 생성을 위한 반복적 다중 뷰 확산 및 재구성

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

초록

Support