Score Distillation Sampling mit gelerntem Manifold-Korrektiv

Zusammenfassung

Score Distillation Sampling (SDS) ist eine neuartige, aber bereits weit verbreitete Methode, die auf einem Bilddiffusionsmodell basiert, um Optimierungsprobleme mithilfe von Textprompts zu steuern. In diesem Artikel führen wir eine detaillierte Analyse der SDS-Verlustfunktion durch, identifizieren ein inhärentes Problem in ihrer Formulierung und schlagen eine überraschend einfache, aber effektive Lösung vor. Konkret zerlegen wir den Verlust in verschiedene Faktoren und isolieren die Komponente, die für verrauschte Gradienten verantwortlich ist. In der ursprünglichen Formulierung wird eine hohe Textführung verwendet, um das Rauschen zu kompensieren, was zu unerwünschten Nebeneffekten führt. Stattdessen trainieren wir ein flaches Netzwerk, das die zeitstufenabhängige Entrauschungsschwäche des Bilddiffusionsmodells nachahmt, um diese effektiv herauszufiltern. Wir demonstrieren die Vielseitigkeit und Effektivität unserer neuartigen Verlustformulierung durch mehrere qualitative und quantitative Experimente, darunter optimierungsbasierte Bildsynthese und -bearbeitung, Zero-Shot-Bildübersetzungsnetzwerktraining und Text-zu-3D-Synthese.

English

Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects. Instead, we train a shallow network mimicking the timestep-dependent denoising deficiency of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through several qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

Score Distillation Sampling mit gelerntem Manifold-Korrektiv

Score Distillation Sampling with Learned Manifold Corrective

Zusammenfassung

Support