DreamTime: 텍스트-3D 콘텐츠 생성을 위한 개선된 최적화 전략

초록

수십억 개의 이미지-텍스트 쌍으로 사전 학습된 텍스트-이미지 확산 모델은 최근 점수 증류(score distillation)를 통해 무작위로 초기화된 신경 방사장(NeRF)을 최적화함으로써 텍스트-3D 콘텐츠 생성이 가능해졌습니다. 그러나 생성된 3D 모델은 두 가지 한계를 보입니다: (a) 채도가 과도하거나 야누스 문제(Janus problem)와 같은 품질 문제; (b) 텍스트 기반 이미지 합성에 비해 극도로 낮은 다양성. 본 논문에서는 NeRF 최적화 과정과 점수 증류에서의 균일한 타임스텝 샘플링 간의 충돌이 이러한 한계의 주요 원인임을 보여줍니다. 이 충돌을 해결하기 위해, 우리는 단조 비증가 함수를 사용하여 타임스텝 샘플링을 우선순위화하는 방법을 제안합니다. 이는 NeRF 최적화를 확산 모델의 샘플링 과정과 일치시킵니다. 광범위한 실험을 통해, 우리의 간단한 재설계가 텍스트-3D 콘텐츠 생성의 품질과 다양성을 크게 향상시킨다는 것을 입증합니다.

English

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled text-to-3D content creation by optimizing a randomly initialized Neural Radiance Fields (NeRF) with score distillation. However, the resultant 3D models exhibit two limitations: (a) quality concerns such as saturated color and the Janus problem; (b) extremely low diversity comparing to text-guided image synthesis. In this paper, we show that the conflict between NeRF optimization process and uniform timestep sampling in score distillation is the main reason for these limitations. To resolve this conflict, we propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns NeRF optimization with the sampling process of diffusion model. Extensive experiments show that our simple redesign significantly improves text-to-3D content creation with higher quality and diversity.

DreamTime: 텍스트-3D 콘텐츠 생성을 위한 개선된 최적화 전략

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

초록

Support