奖励机制足以实现快速逼真的文本到图像生成
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
March 17, 2025
作者: Yihong Luo, Tianyang Hu, Weijian Luo, Kenji Kawaguchi, Jing Tang
cs.AI
摘要
在人工智能生成内容(AIGC)领域,将生成的图像与复杂文本提示及人类偏好对齐是一项核心挑战。随着奖励增强的扩散蒸馏方法崭露头角,成为提升文本到图像模型可控性与保真度的有效途径,我们观察到一种根本性的范式转变:随着条件愈发具体且奖励信号增强,奖励本身在生成过程中占据了主导地位。相比之下,扩散损失则沦为一种成本过高的正则化形式。为全面验证这一假设,我们引入了R0,一种通过正则化奖励最大化实现条件生成的新方法。R0摒弃了复杂的扩散蒸馏损失,转而提出了一种新视角,将图像生成视为数据空间中的优化问题,旨在寻找具有高组合奖励的有效图像。通过创新的生成器参数化设计与恰当的正则化技术,我们利用R0大规模训练了当前最先进的少步文本到图像生成模型。我们的研究成果挑战了扩散后训练与条件生成的常规认知,证明了在复杂条件下奖励的主导作用。我们期望这些发现能推动AIGC领域内更多以人为中心、以奖励为中心的生成范式研究。代码已发布于https://github.com/Luo-Yihong/R0。
English
Aligning generated images to complicated text prompts and human preferences
is a central challenge in Artificial Intelligence-Generated Content (AIGC).
With reward-enhanced diffusion distillation emerging as a promising approach
that boosts controllability and fidelity of text-to-image models, we identify a
fundamental paradigm shift: as conditions become more specific and reward
signals stronger, the rewards themselves become the dominant force in
generation. In contrast, the diffusion losses serve as an overly expensive form
of regularization. To thoroughly validate our hypothesis, we introduce R0, a
novel conditional generation approach via regularized reward maximization.
Instead of relying on tricky diffusion distillation losses, R0 proposes a new
perspective that treats image generations as an optimization problem in data
space which aims to search for valid images that have high compositional
rewards. By innovative designs of the generator parameterization and proper
regularization techniques, we train state-of-the-art few-step text-to-image
generative models with R0 at scales. Our results challenge the conventional
wisdom of diffusion post-training and conditional generation by demonstrating
that rewards play a dominant role in scenarios with complex conditions. We hope
our findings can contribute to further research into human-centric and
reward-centric generation paradigms across the broader field of AIGC. Code is
available at https://github.com/Luo-Yihong/R0.Summary
AI-Generated Summary