ノイズ条件付き知覚に基づく拡散モデルの整合化

要旨

最近の人間の嗜好最適化の進展は、当初言語モデル（LM）向けに開発されたものですが、テキストから画像への拡散モデル（Diffusion Models）にも有望であることが示され、プロンプトの整合性、視覚的魅力、ユーザーの嗜好を向上させています。LMとは異なり、拡散モデルは通常ピクセル空間またはVAE空間で最適化されますが、これは人間の知覚と整合せず、嗜好整合段階でのトレーニングが遅く非効率になる原因となっています。これらの問題を解決するため、我々は拡散モデルのU-Net埋め込み空間における知覚的目標関数の使用を提案します。我々のアプローチでは、Stable Diffusion 1.5およびXLを、この埋め込み空間内でDirect Preference Optimization（DPO）、Contrastive Preference Optimization（CPO）、および教師ありファインチューニング（SFT）を用いてファインチューニングします。この方法は、品質や計算コストを含む様々な指標において、標準的な潜在空間実装を大幅に上回ります。SDXLの場合、我々のアプローチはPartiPromptsデータセットにおいて、オリジナルのオープンソースSDXL-DPOに対して60.8％の一般的嗜好、62.2％の視覚的魅力、52.1％のプロンプト追従を提供し、計算量を大幅に削減します。我々のアプローチは、拡散モデルの人間の嗜好整合の効率と品質を向上させるだけでなく、他の最適化技術とも容易に統合可能です。トレーニングコードとLoRA重みはこちらで公開されます： https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1

English

Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs, Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception, leading to slower and less efficient training during the preference alignment stage. We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues. Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within this embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost. For SDXL, our approach provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt following against original open-sourced SDXL-DPO on the PartiPrompts dataset, while significantly reducing compute. Our approach not only improves the efficiency and quality of human preference alignment for diffusion models but is also easily integrable with other optimization techniques. The training code and LoRA weights will be available here: https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1

ノイズ条件付き知覚に基づく拡散モデルの整合化

Aligning Diffusion Models with Noise-Conditioned Perception

要旨

Support