Sherpa3D: Coarse 3D Prior를 통한 고품질 텍스트-3D 생성 향상

초록

최근 텍스트 프롬프트에서 3D 콘텐츠를 생성하는 기술은 2D 및 3D 확산 모델을 활용하여 눈부신 발전을 보여주고 있습니다. 3D 확산 모델은 뛰어난 다중 뷰 일관성을 보장하지만, 제한된 3D 데이터로 인해 고품질이고 다양한 3D 자산을 생성하는 능력이 제한됩니다. 반면, 2D 확산 모델은 3D 데이터 없이도 우수한 일반화와 풍부한 디테일을 달성하는 증류 접근법을 찾아냈습니다. 그러나 2D 리프팅 방법은 본질적인 뷰-불특정 모호성으로 인해 심각한 다면 Janus 문제를 야기하며, 이는 텍스트 프롬프트가 일관된 3D 결과를 학습하기에 충분한 지침을 제공하지 못하게 합니다. 비용이 많이 드는 뷰포인트 인식 모델을 재학습하는 대신, 우리는 쉽게 접근할 수 있는 대략적인 3D 지식을 최대한 활용하여 프롬프트를 강화하고 2D 리프팅 최적화를 안내하여 개선하는 방법을 연구합니다. 본 논문에서는 고해상도, 일반화 가능성, 그리고 기하학적 일관성을 동시에 달성하는 새로운 텍스트-투-3D 프레임워크인 Sherpa3D를 제안합니다. 구체적으로, 우리는 3D 확산 모델에 의해 생성된 대략적인 3D 사전 지식에서 유래한 두 가지 안내 전략을 설계합니다: 기하학적 충실도를 위한 구조적 안내와 3D 일관성을 위한 의미적 안내입니다. 이 두 가지 유형의 안내를 사용하여 2D 확산 모델은 다양하고 고품질의 결과로 3D 콘텐츠를 풍부하게 만듭니다. 광범위한 실험을 통해 우리의 Sherpa3D가 품질과 3D 일관성 측면에서 최신 텍스트-투-3D 방법을 능가하는 우수성을 입증합니다.

English

Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.

Sherpa3D: Coarse 3D Prior를 통한 고품질 텍스트-3D 생성 향상

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

초록

Support