Wonder3D: 크로스 도메인 디퓨전을 활용한 단일 이미지에서 3D로의 변환

초록

본 연구에서는 단일 뷰 이미지로부터 고품질의 텍스처 메쉬를 효율적으로 생성하는 새로운 방법인 Wonder3D를 소개합니다. Score Distillation Sampling(SDS)에 기반한 최근 방법들은 2D 확산 모델을 활용해 3D 형상을 복원할 가능성을 보여주었지만, 일반적으로 형태별 최적화에 시간이 많이 소요되고 일관된 형상을 얻기 어려운 문제가 있었습니다. 반면, 일부 연구들은 신속한 네트워크 추론을 통해 직접 3D 정보를 생성하지만, 그 결과물은 종종 품질이 낮고 기하학적 디테일이 부족합니다. 이미지-3D 변환 작업의 품질, 일관성, 효율성을 종합적으로 개선하기 위해, 우리는 다중 뷰 노멀 맵과 해당 컬러 이미지를 생성하는 크로스 도메인 확산 모델을 제안합니다. 일관성을 보장하기 위해, 우리는 뷰와 모달리티 간 정보 교환을 용이하게 하는 다중 뷰 크로스 도메인 어텐션 메커니즘을 도입했습니다. 마지막으로, 다중 뷰 2D 표현에서 고품질 표면을 추출하는 기하학적 인식 노멀 융합 알고리즘을 제안합니다. 광범위한 평가를 통해 우리의 방법이 기존 연구 대비 고품질의 재구성 결과, 강력한 일반화 능력, 그리고 합리적인 수준의 효율성을 달성함을 입증했습니다.

English

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

Wonder3D: 크로스 도메인 디퓨전을 활용한 단일 이미지에서 3D로의 변환

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

초록

Support