学習済み多様体補正を伴うスコア蒸留サンプリング

要旨

スコア蒸留サンプリング（Score Distillation Sampling, SDS）は、最近登場したものの既に広く普及している手法であり、テキストプロンプトを用いて最適化問題を制御するために画像拡散モデルを利用します。本論文では、SDSの損失関数について詳細な分析を行い、その定式化に内在する問題を特定し、驚くほど簡単でありながら効果的な修正を提案します。具体的には、損失を異なる要素に分解し、ノイズの多い勾配を生み出す成分を分離します。元の定式化では、ノイズを補うために高いテキストガイダンスが使用され、望ましくない副作用を引き起こしていました。代わりに、画像拡散モデルのタイムステップ依存のノイズ除去の欠陥を模倣する浅いネットワークを訓練し、それを効果的に排除します。我々は、最適化ベースの画像合成と編集、ゼロショット画像変換ネットワークの訓練、テキストから3Dへの合成など、いくつかの定性的および定量的な実験を通じて、新しい損失定式化の汎用性と有効性を実証します。

English

Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects. Instead, we train a shallow network mimicking the timestep-dependent denoising deficiency of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through several qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

学習済み多様体補正を伴うスコア蒸留サンプリング

Score Distillation Sampling with Learned Manifold Corrective

要旨

Support