透過AI反饋的直接偏好優化,提升您的人類圖像生成模型
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback
May 30, 2024
作者: Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
cs.AI
摘要
通過文本到圖像(T2I)方法生成高質量的人類圖像是一項重要且具有挑戰性的任務。與一般的圖像生成不同,人類圖像合成必須滿足與人體姿態、解剖結構以及與文本提示對齊相關的嚴格標準,這使得實現逼真效果尤為困難。基於擴散模型的T2I生成技術的最新進展顯示出潛力,但在滿足人類特定偏好方面仍存在挑戰。本文中,我們引入了一種專門針對人類圖像生成的新方法,利用直接偏好優化(DPO)。具體而言,我們提出了一種高效的方法,用於構建專門的DPO數據集,以訓練人類圖像生成模型,而無需昂貴的人類反饋。我們還提出了一種改進的損失函數,通過最小化偽影和提高圖像保真度來增強DPO訓練過程。我們的方法展示了其在生成人類圖像方面的多功能性和有效性,包括個性化的文本到圖像生成。通過全面的評估,我們表明我們的方法顯著推進了人類圖像生成的技術水平,在自然解剖結構、姿態和文本圖像對齊方面取得了優異的結果。
English
The generation of high-quality human images through text-to-image (T2I)
methods is a significant yet challenging task. Distinct from general image
generation, human image synthesis must satisfy stringent criteria related to
human pose, anatomy, and alignment with textual prompts, making it particularly
difficult to achieve realistic results. Recent advancements in T2I generation
based on diffusion models have shown promise, yet challenges remain in meeting
human-specific preferences. In this paper, we introduce a novel approach
tailored specifically for human image generation utilizing Direct Preference
Optimization (DPO). Specifically, we introduce an efficient method for
constructing a specialized DPO dataset for training human image generation
models without the need for costly human feedback. We also propose a modified
loss function that enhances the DPO training process by minimizing artifacts
and improving image fidelity. Our method demonstrates its versatility and
effectiveness in generating human images, including personalized text-to-image
generation. Through comprehensive evaluations, we show that our approach
significantly advances the state of human image generation, achieving superior
results in terms of natural anatomies, poses, and text-image alignment.Summary
AI-Generated Summary