LATTE3D: 대규모 분할 텍스트-투-향상된 3D 합성

초록

최근의 텍스트-3D 생성 접근법은 인상적인 3D 결과물을 생성하지만, 프롬프트당 최대 1시간까지 소요되는 시간 소모적인 최적화 과정이 필요합니다. ATT3D와 같은 분할 상환 방식(amortized method)은 여러 프롬프트를 동시에 최적화하여 효율성을 높이고 빠른 텍스트-3D 합성을 가능하게 합니다. 그러나 이러한 방법은 고주파 기하학적 및 텍스처 세부 사항을 포착하지 못하고 대규모 프롬프트 세트로 확장하기 어려워 일반화 성능이 떨어집니다. 우리는 LATTE3D를 소개하여 이러한 한계를 해결하고 훨씬 더 큰 프롬프트 세트에서 빠르고 고품질의 생성을 달성합니다. 우리 방법의 핵심은 1) 확장 가능한 아키텍처 구축과 2) 3D 데이터를 활용한 최적화로, 3D 인식 확산 사전(3D-aware diffusion priors), 형태 정규화(shape regularization), 모델 초기화를 통해 다양하고 복잡한 훈련 프롬프트에 대한 견고성을 달성합니다. LATTE3D는 신경 필드(neural field)와 텍스처가 적용된 표면 생성을 분할 상환하여 단일 순방향 전달(forward pass)로 고도로 세부화된 텍스처 메시를 생성합니다. LATTE3D는 400ms 내에 3D 객체를 생성하며, 빠른 테스트 시간 최적화를 통해 더욱 향상될 수 있습니다.

English

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.

LATTE3D: 대규모 분할 텍스트-투-향상된 3D 합성

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

초록

Support