通过AI反馈的直接偏好优化,提升您的人类图像生成模型
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback
May 30, 2024
作者: Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
cs.AI
摘要
通过文本到图像(T2I)方法生成高质量的人类图像是一项重要且具有挑战性的任务。与一般的图像生成不同,人类图像合成必须满足与人体姿态、解剖结构以及与文本提示对齐相关的严格标准,这使得实现逼真效果尤为困难。基于扩散模型的T2I生成技术近期取得了进展,但在满足人类特定偏好方面仍存在挑战。本文提出了一种专门针对人类图像生成的新方法,利用直接偏好优化(DPO)。具体而言,我们引入了一种高效的方法,用于构建专门的DPO数据集以训练人类图像生成模型,而无需昂贵的人工反馈。我们还提出了一种改进的损失函数,通过减少伪影并提高图像保真度来增强DPO训练过程。我们的方法展示了其在生成人类图像方面的多功能性和有效性,包括个性化的文本到图像生成。通过全面评估,我们表明该方法显著推进了人类图像生成的技术水平,在自然解剖结构、姿态以及文本图像对齐方面取得了优异成果。
English
The generation of high-quality human images through text-to-image (T2I)
methods is a significant yet challenging task. Distinct from general image
generation, human image synthesis must satisfy stringent criteria related to
human pose, anatomy, and alignment with textual prompts, making it particularly
difficult to achieve realistic results. Recent advancements in T2I generation
based on diffusion models have shown promise, yet challenges remain in meeting
human-specific preferences. In this paper, we introduce a novel approach
tailored specifically for human image generation utilizing Direct Preference
Optimization (DPO). Specifically, we introduce an efficient method for
constructing a specialized DPO dataset for training human image generation
models without the need for costly human feedback. We also propose a modified
loss function that enhances the DPO training process by minimizing artifacts
and improving image fidelity. Our method demonstrates its versatility and
effectiveness in generating human images, including personalized text-to-image
generation. Through comprehensive evaluations, we show that our approach
significantly advances the state of human image generation, achieving superior
results in terms of natural anatomies, poses, and text-image alignment.Summary
AI-Generated Summary