학습된 매니폴드 보정을 통한 점수 증류 샘플링

초록

스코어 디스틸레이션 샘플링(Score Distillation Sampling, SDS)은 텍스트 프롬프트를 사용하여 최적화 문제를 제어하기 위해 이미지 확산 모델에 의존하는 최신이면서도 이미 널리 인기 있는 방법이다. 본 논문에서는 SDS 손실 함수에 대한 심층적인 분석을 수행하고, 그 공식화에 내재된 문제를 식별하며, 놀랍도록 간단하지만 효과적인 해결책을 제안한다. 구체적으로, 우리는 손실을 다양한 요소로 분해하고 노이즈가 있는 그래디언트를 유발하는 구성 요소를 분리한다. 원래의 공식에서는 노이즈를 고려하기 위해 높은 텍스트 가이던스가 사용되며, 이로 인해 원치 않는 부작용이 발생한다. 대신, 우리는 이미지 확산 모델의 시간 단계에 따른 노이즈 제거 결함을 모방하는 얕은 네트워크를 훈련시켜 이를 효과적으로 제거한다. 우리는 최적화 기반 이미지 합성 및 편집, 제로샷 이미지 번역 네트워크 훈련, 텍스트-3D 합성 등 여러 정성적 및 정량적 실험을 통해 새로운 손실 공식의 다양성과 효과성을 입증한다.

English

Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects. Instead, we train a shallow network mimicking the timestep-dependent denoising deficiency of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through several qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

학습된 매니폴드 보정을 통한 점수 증류 샘플링

Score Distillation Sampling with Learned Manifold Corrective

초록

Support