ChatPaper.aiChatPaper

MIRO:多奖励条件预训练提升文本到图像生成质量与效率

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

October 29, 2025
作者: Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Vicky Kalogeiton, David Picard
cs.AI

摘要

当前基于大规模未筛选数据集训练的文本到图像生成模型虽具备多样化生成能力,却难以契合用户偏好。近期研究专门设计了奖励模型,通过后验选择生成图像使其与特定奖励(通常指用户偏好)对齐。但这种舍弃信息数据并优化单一奖励的方式,往往会损害生成结果的多样性、语义保真度与训练效率。我们提出在训练过程中引入多奖励模型条件约束,取代后处理机制,使模型直接学习用户偏好。研究表明,该方法不仅能显著提升生成图像的视觉质量,还可大幅加速训练进程。我们提出的MIRO方法在GenEval组合基准测试及用户偏好评分(PickAScore、ImageReward、HPSv2)中均达到最先进性能。
English
Current text-to-image generative models are trained on large uncurated datasets to enable diverse generation capabilities. However, this does not align well with user preferences. Recently, reward models have been specifically designed to perform post-hoc selection of generated images and align them to a reward, typically user preference. This discarding of informative data together with the optimizing for a single reward tend to harm diversity, semantic fidelity and efficiency. Instead of this post-processing, we propose to condition the model on multiple reward models during training to let the model learn user preferences directly. We show that this not only dramatically improves the visual quality of the generated images but it also significantly speeds up the training. Our proposed method, called MIRO, achieves state-of-the-art performances on the GenEval compositional benchmark and user-preference scores (PickAScore, ImageReward, HPSv2).
PDF163December 2, 2025