自我對弈微調擴散模型,用於文本到圖像生成
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
February 15, 2024
作者: Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu
cs.AI
摘要
在生成人工智慧(GenAI)領域中,微調擴散模型仍然是一個未被充分探索的前沿,特別是與微調大型語言模型(LLMs)取得的顯著進展相比。儘管頂尖的擴散模型,如穩定擴散(SD)和SDXL,依賴監督式微調,但它們的性能在看過一定量的數據後不可避免地會達到瓶頸。最近,強化學習(RL)已被用於使用人類偏好數據微調擴散模型,但每個文本提示至少需要兩幅圖像(“贏家”和“輸家”圖像)。在本文中,我們介紹了一種名為自我對弈微調擴散模型(SPIN-Diffusion)的創新技術,其中擴散模型與其早期版本進行競爭,促進一個迭代的自我改進過程。我們的方法提供了一種替代傳統監督式微調和RL策略的方法,顯著提高了模型性能和對齊性。我們在Pick-a-Pic數據集上的實驗顯示,SPIN-Diffusion在人類偏好對齊和視覺吸引力方面優於現有的監督式微調方法,即從第一次迭代開始。到第二次迭代時,它在所有指標上超過了基於RLHF的方法的性能,並且使用更少的數據就取得了這些結果。
English
Fine-tuning Diffusion Models remains an underexplored frontier in generative
artificial intelligence (GenAI), especially when compared with the remarkable
progress made in fine-tuning Large Language Models (LLMs). While cutting-edge
diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised
fine-tuning, their performance inevitably plateaus after seeing a certain
volume of data. Recently, reinforcement learning (RL) has been employed to
fine-tune diffusion models with human preference data, but it requires at least
two images ("winner" and "loser" images) for each text prompt. In this paper,
we introduce an innovative technique called self-play fine-tuning for diffusion
models (SPIN-Diffusion), where the diffusion model engages in competition with
its earlier versions, facilitating an iterative self-improvement process. Our
approach offers an alternative to conventional supervised fine-tuning and RL
strategies, significantly improving both model performance and alignment. Our
experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms
the existing supervised fine-tuning method in aspects of human preference
alignment and visual appeal right from its first iteration. By the second
iteration, it exceeds the performance of RLHF-based methods across all metrics,
achieving these results with less data.Summary
AI-Generated Summary