PromptRL：基于流模型的图像生成中提示对强化学习的关键影响

摘要

流匹配模型已彻底改变文本到图像生成领域，而强化学习作为关键的后训练策略，对实现奖励目标对齐至关重要。本研究揭示当前流匹配模型的强化学习流程存在两个未被充分重视但至关重要的问题：因生成多样性不足导致的样本低效性，以及显著的提示词过拟合现象——模型会记忆特定训练表述，在面对语义相同但风格变化的提示词时出现性能急剧衰退。我们提出PromptRL（基于流模型的图像生成中提示词优化的强化学习框架），将语言模型作为可训练的提示词优化智能体直接嵌入基于流的强化学习优化循环。该设计产生两项互补优势：快速形成复杂的提示词重写能力，以及关键性地通过协同训练机制重塑优化动态。PromptRL在多项基准测试中实现顶尖性能，在GenEval上获得0.97分，OCR准确率达0.98，PickScore得分24.05。此外，我们在大规模图像编辑模型上验证了该强化学习方法的有效性，仅用6万次迭代就将FLUX.1-Kontext的EditReward从1.19提升至1.43，超越得分为1.37的Gemini 2.5 Flash Image（亦称Nano Banana），并与依赖细粒度数据标注和复杂多阶段训练的ReasonNet（1.44分）性能相当。大量实验证实，相比纯流模型强化学习，PromptRL能持续达到更高性能上限，且所需迭代次数减少超2倍。代码已开源：https://github.com/G-U-N/UniRL。

English

Flow matching models (FMs) have revolutionized text-to-image (T2I) generation, with reinforcement learning (RL) serving as a critical post-training strategy for alignment with reward objectives. In this research, we show that current RL pipelines for FMs suffer from two underappreciated yet important limitations: sample inefficiency due to insufficient generation diversity, and pronounced prompt overfitting, where models memorize specific training formulations and exhibit dramatic performance collapse when evaluated on semantically equivalent but stylistically varied prompts. We present PromptRL (Prompt Matters in RL for Flow-Based Image Generation), a framework that incorporates language models (LMs) as trainable prompt refinement agents directly within the flow-based RL optimization loop. This design yields two complementary benefits: rapid development of sophisticated prompt rewriting capabilities and, critically, a synergistic training regime that reshapes the optimization dynamics. PromptRL achieves state-of-the-art performance across multiple benchmarks, obtaining scores of 0.97 on GenEval, 0.98 on OCR accuracy, and 24.05 on PickScore. Furthermore, we validate the effectiveness of our RL approach on large-scale image editing models, improving the EditReward of FLUX.1-Kontext from 1.19 to 1.43 with only 0.06 million rollouts, surpassing Gemini 2.5 Flash Image (also known as Nano Banana), which scores 1.37, and achieving comparable performance with ReasonNet (1.44), which relied on fine-grained data annotations along with a complex multi-stage training. Our extensive experiments empirically demonstrate that PromptRL consistently achieves higher performance ceilings while requiring over 2times fewer rollouts compared to naive flow-only RL. Our code is available at https://github.com/G-U-N/UniRL.

PromptRL：基于流模型的图像生成中提示对强化学习的关键影响

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

摘要

Support