将扩散模型与噪声条件感知对齐
Aligning Diffusion Models with Noise-Conditioned Perception
June 25, 2024
作者: Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov
cs.AI
摘要
最近在人类偏好优化方面取得的进展,最初是为了语言模型(LMs)而开发的,已经显示出在文本到图像扩散模型中具有潜力,增强了提示对齐、视觉吸引力和用户偏好。与LMs不同,扩散模型通常在像素或VAE空间中进行优化,这与人类感知不太一致,导致在偏好对齐阶段训练速度较慢且效率较低。我们提出在扩散模型的U-Net嵌入空间中使用感知目标来解决这些问题。我们的方法涉及在这个嵌入空间内使用直接偏好优化(DPO)、对比偏好优化(CPO)和监督微调(SFT)来微调稳定扩散1.5和XL。该方法在各种指标上显著优于标准潜在空间实现,包括质量和计算成本。对于SDXL,我们的方法在PartiPrompts数据集上相较于原始开源的SDXL-DPO,提供了60.8\%的一般偏好、62.2\%的视觉吸引力和52.1\%的提示跟随,并显著减少了计算量。我们的方法不仅提高了扩散模型人类偏好对齐的效率和质量,而且还很容易与其他优化技术集成。训练代码和LoRA权重将在此处提供:https://huggingface.co/alexgambashidze/SDXL_NCP-DPO_v0.1
English
Recent advancements in human preference optimization, initially developed for
Language Models (LMs), have shown promise for text-to-image Diffusion Models,
enhancing prompt alignment, visual appeal, and user preference. Unlike LMs,
Diffusion Models typically optimize in pixel or VAE space, which does not align
well with human perception, leading to slower and less efficient training
during the preference alignment stage. We propose using a perceptual objective
in the U-Net embedding space of the diffusion model to address these issues.
Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct
Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and
supervised fine-tuning (SFT) within this embedding space. This method
significantly outperforms standard latent-space implementations across various
metrics, including quality and computational cost. For SDXL, our approach
provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt
following against original open-sourced SDXL-DPO on the PartiPrompts dataset,
while significantly reducing compute. Our approach not only improves the
efficiency and quality of human preference alignment for diffusion models but
is also easily integrable with other optimization techniques. The training code
and LoRA weights will be available here:
https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1Summary
AI-Generated Summary