自我对弈微调扩散模型用于文本到图像生成

摘要

在生成人工智能（GenAI）领域，微调扩散模型仍然是一个未被充分探索的前沿，特别是与微调大型语言模型（LLMs）取得的显著进展相比。尽管尖端扩散模型如稳定扩散（SD）和SDXL依赖于监督微调，但它们的性能在看到一定量的数据后不可避免地会达到瓶颈。最近，强化学习（RL）已被用于利用人类偏好数据微调扩散模型，但这需要至少两幅图像（"赢家"和"输家"图像）用于每个文本提示。本文介绍了一种名为自我对弈微调扩散模型（SPIN-Diffusion）的创新技术，其中扩散模型与其早期版本进行竞争，促进迭代的自我改进过程。我们的方法为传统监督微调和RL策略提供了一种替代方案，显著提高了模型性能和对齐度。我们在Pick-a-Pic数据集上的实验表明，SPIN-Diffusion在人类偏好对齐和视觉吸引力方面优于现有的监督微调方法，甚至在第一次迭代时就表现出色。到第二次迭代时，它在所有指标上均超过了基于RLHF方法的性能，而且使用的数据量更少。

English

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

自我对弈微调扩散模型用于文本到图像生成

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

摘要

Support