文本到图像生成的可扩展排名偏好优化
Scalable Ranked Preference Optimization for Text-to-Image Generation
October 23, 2024
作者: Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag
cs.AI
摘要
直接偏好优化(DPO)已成为一种强大的方法,用于将文本到图像(T2I)模型与人类反馈对齐。不幸的是,成功将DPO应用于T2I模型需要大量资源来收集和标记大规模数据集,例如,数百万个生成的配对图像,带有人类偏好标注。此外,随着T2I模型的快速改进导致图像质量提高,这些人类偏好数据集可能会很快过时。在这项工作中,我们研究了一种可扩展的方法,用于收集用于DPO训练的大规模完全合成数据集。具体而言,配对图像的偏好是使用预训练奖励函数生成的,消除了需要让人类参与注释过程的必要性,极大提高了数据集收集效率。此外,我们展示了这种数据集允许跨多个模型对预测进行平均,并收集排名偏好,而不是成对偏好。此外,我们引入了RankDPO来利用排名反馈增强基于DPO的方法。将RankDPO应用于SDXL和SD3-Medium模型,并使用我们合成生成的偏好数据集“Syn-Pic”,提高了在基准测试中(如T2I-Compbench,GenEval和DPG-Bench)的提示遵循和视觉质量(通过用户研究)。这一流程提供了一个实用且可扩展的解决方案,用于开发更好的偏好数据集,以提升文本到图像模型的性能。
English
Direct Preference Optimization (DPO) has emerged as a powerful approach to
align text-to-image (T2I) models with human feedback. Unfortunately, successful
application of DPO to T2I models requires a huge amount of resources to collect
and label large-scale datasets, e.g., millions of generated paired images
annotated with human preferences. In addition, these human preference datasets
can get outdated quickly as the rapid improvements of T2I models lead to higher
quality images. In this work, we investigate a scalable approach for collecting
large-scale and fully synthetic datasets for DPO training. Specifically, the
preferences for paired images are generated using a pre-trained reward
function, eliminating the need for involving humans in the annotation process,
greatly improving the dataset collection efficiency. Moreover, we demonstrate
that such datasets allow averaging predictions across multiple models and
collecting ranked preferences as opposed to pairwise preferences. Furthermore,
we introduce RankDPO to enhance DPO-based methods using the ranking feedback.
Applying RankDPO on SDXL and SD3-Medium models with our synthetically generated
preference dataset ``Syn-Pic'' improves both prompt-following (on benchmarks
like T2I-Compbench, GenEval, and DPG-Bench) and visual quality (through user
studies). This pipeline presents a practical and scalable solution to develop
better preference datasets to enhance the performance of text-to-image models.Summary
AI-Generated Summary