自我對弈微調擴散模型，用於文本到圖像生成

摘要

在生成人工智慧（GenAI）領域中，微調擴散模型仍然是一個未被充分探索的前沿，特別是與微調大型語言模型（LLMs）取得的顯著進展相比。儘管頂尖的擴散模型，如穩定擴散（SD）和SDXL，依賴監督式微調，但它們的性能在看過一定量的數據後不可避免地會達到瓶頸。最近，強化學習（RL）已被用於使用人類偏好數據微調擴散模型，但每個文本提示至少需要兩幅圖像（“贏家”和“輸家”圖像）。在本文中，我們介紹了一種名為自我對弈微調擴散模型（SPIN-Diffusion）的創新技術，其中擴散模型與其早期版本進行競爭，促進一個迭代的自我改進過程。我們的方法提供了一種替代傳統監督式微調和RL策略的方法，顯著提高了模型性能和對齊性。我們在Pick-a-Pic數據集上的實驗顯示，SPIN-Diffusion在人類偏好對齊和視覺吸引力方面優於現有的監督式微調方法，即從第一次迭代開始。到第二次迭代時，它在所有指標上超過了基於RLHF的方法的性能，並且使用更少的數據就取得了這些結果。

English

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

自我對弈微調擴散模型，用於文本到圖像生成

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

摘要

Support