將擴散模型與噪聲條件感知對齊
Aligning Diffusion Models with Noise-Conditioned Perception
June 25, 2024
作者: Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov
cs.AI
摘要
最近在人類偏好優化方面的最新進展,最初是為語言模型(LMs)開發的,已顯示出對於文本到圖像擴散模型具有潛力,增強提示對齊、視覺吸引力和用戶偏好。與LMs不同,擴散模型通常在像素或VAE空間中進行優化,這與人類感知不太一致,導致在偏好對齊階段訓練速度較慢且效率較低。我們提出在擴散模型的U-Net嵌入空間中使用感知目標來解決這些問題。我們的方法涉及在該嵌入空間內使用直接偏好優化(DPO)、對比偏好優化(CPO)和監督微調(SFT)來微調穩定擴散1.5和XL。這種方法在各種指標上明顯優於標準潛在空間實現,包括質量和計算成本。對於SDXL,我們的方法在PartiPrompts數據集上相對於原始開源的SDXL-DPO,提供了60.8\%的一般偏好、62.2\%的視覺吸引力和52.1\%的提示跟隨,同時顯著降低了計算成本。我們的方法不僅提高了擴散模型的人類偏好對齊的效率和質量,而且還可以輕鬆與其他優化技術集成。訓練代碼和LoRA權重將在此處提供:https://huggingface.co/alexgambashidze/SDXL_NCP-DPO_v0.1
English
Recent advancements in human preference optimization, initially developed for
Language Models (LMs), have shown promise for text-to-image Diffusion Models,
enhancing prompt alignment, visual appeal, and user preference. Unlike LMs,
Diffusion Models typically optimize in pixel or VAE space, which does not align
well with human perception, leading to slower and less efficient training
during the preference alignment stage. We propose using a perceptual objective
in the U-Net embedding space of the diffusion model to address these issues.
Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct
Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and
supervised fine-tuning (SFT) within this embedding space. This method
significantly outperforms standard latent-space implementations across various
metrics, including quality and computational cost. For SDXL, our approach
provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt
following against original open-sourced SDXL-DPO on the PartiPrompts dataset,
while significantly reducing compute. Our approach not only improves the
efficiency and quality of human preference alignment for diffusion models but
is also easily integrable with other optimization techniques. The training code
and LoRA weights will be available here:
https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1Summary
AI-Generated Summary