擴散模型的雙字幕偏好優化
Dual Caption Preference Optimization for Diffusion Models
February 9, 2025
作者: Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral
cs.AI
摘要
最近在人類偏好優化方面的最新進展,最初是為大型語言模型(LLMs)開發的,已經顯示出在改善文本到圖像擴散模型方面具有顯著潛力。這些方法旨在學習偏好樣本的分佈,同時將其與不太受歡迎的樣本區分開來。然而,現有的偏好數據集通常在這些分佈之間存在重疊,導致分佈衝突。此外,我們發現輸入提示中包含了與不太受歡迎的圖像無關的信息,限制了去噪網絡準確預測偏好優化方法中的噪聲的能力,這稱為無關提示問題。為了應對這些挑戰,我們提出了雙標題偏好優化(DCPO),這是一種利用兩個不同標題來減輕無關提示的新方法。為了應對分佈衝突,我們引入了Pick-Double Caption數據集,這是Pick-a-Pic v2的修改版本,為偏好和不太受歡迎的圖像提供獨立的標題。我們進一步提出了三種不同的生成不同標題的策略:標題生成、擾動和混合方法。我們的實驗表明,DCPO顯著提高了圖像質量和與提示的相關性,優於多個指標,包括Pickscore、HPSv2.1、GenEval、CLIPscore和ImageReward,在以SD 2.1為骨幹進行微調的情況下,優於Stable Diffusion(SD)2.1、SFT_Chosen、Diffusion-DPO和MaPO。
English
Recent advancements in human preference optimization, originally developed
for Large Language Models (LLMs), have shown significant potential in improving
text-to-image diffusion models. These methods aim to learn the distribution of
preferred samples while distinguishing them from less preferred ones. However,
existing preference datasets often exhibit overlap between these distributions,
leading to a conflict distribution. Additionally, we identified that input
prompts contain irrelevant information for less preferred images, limiting the
denoising network's ability to accurately predict noise in preference
optimization methods, known as the irrelevant prompt issue. To address these
challenges, we propose Dual Caption Preference Optimization (DCPO), a novel
approach that utilizes two distinct captions to mitigate irrelevant prompts. To
tackle conflict distribution, we introduce the Pick-Double Caption dataset, a
modified version of Pick-a-Pic v2 with separate captions for preferred and less
preferred images. We further propose three different strategies for generating
distinct captions: captioning, perturbation, and hybrid methods. Our
experiments show that DCPO significantly improves image quality and relevance
to prompts, outperforming Stable Diffusion (SD) 2.1, SFT_Chosen, Diffusion-DPO,
and MaPO across multiple metrics, including Pickscore, HPSv2.1, GenEval,
CLIPscore, and ImageReward, fine-tuned on SD 2.1 as the backbone.Summary
AI-Generated Summary