无需偏好图像对的文本到图像扩散模型自由对齐

摘要

基于扩散模型的文本到图像（T2I）生成技术近期取得了显著进展，能够从文本提示中生成高质量图像。然而，确保生成图像与文本之间的精确对齐仍是当前顶尖扩散模型面临的一大挑战。为此，现有研究采用基于人类反馈的强化学习（RLHF）来使T2I输出更符合人类偏好。这些方法要么直接依赖于成对的图像偏好数据，要么需要一个学习得到的奖励函数，两者都严重依赖成本高昂、质量上乘的人工标注，因而在可扩展性上存在局限。本研究中，我们提出了文本偏好优化（TPO）框架，实现了无需成对图像偏好数据的“免费午餐”式T2I模型对齐。TPO通过训练模型偏好匹配的提示而非不匹配的提示来工作，其中不匹配提示是通过使用大型语言模型扰动原始描述构建的。我们的框架具有通用性，与现有的基于偏好的算法兼容。我们将DPO和KTO扩展至我们的设定中，分别得到TDPO和TKTO。在多个基准上的定量与定性评估表明，我们的方法持续超越原有版本，获得了更好的人类偏好评分和更优的文本到图像对齐效果。我们的开源代码可在https://github.com/DSL-Lab/T2I-Free-Lunch-Alignment获取。

English

Recent advances in diffusion-based text-to-image (T2I) models have led to remarkable success in generating high-quality images from textual prompts. However, ensuring accurate alignment between the text and the generated image remains a significant challenge for state-of-the-art diffusion models. To address this, existing studies employ reinforcement learning with human feedback (RLHF) to align T2I outputs with human preferences. These methods, however, either rely directly on paired image preference data or require a learned reward function, both of which depend heavily on costly, high-quality human annotations and thus face scalability limitations. In this work, we introduce Text Preference Optimization (TPO), a framework that enables "free-lunch" alignment of T2I models, achieving alignment without the need for paired image preference data. TPO works by training the model to prefer matched prompts over mismatched prompts, which are constructed by perturbing original captions using a large language model. Our framework is general and compatible with existing preference-based algorithms. We extend both DPO and KTO to our setting, resulting in TDPO and TKTO. Quantitative and qualitative evaluations across multiple benchmarks show that our methods consistently outperform their original counterparts, delivering better human preference scores and improved text-to-image alignment. Our Open-source code is available at https://github.com/DSL-Lab/T2I-Free-Lunch-Alignment.

无需偏好图像对的文本到图像扩散模型自由对齐

Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

摘要

Support