RewardFlow：以奖励优化驱动的图像生成

摘要

我们提出RewardFlow——一种无需逆向计算的新型框架，通过多奖励朗之万动力学在推理阶段调控预训练的扩散与流匹配模型。该框架统一了语义对齐、感知保真度、局部定位、目标一致性和人类偏好等互补性可微分奖励，并进一步引入基于可微分视觉问答的奖励机制，通过语言-视觉推理提供细粒度语义监督。为协调这些异构目标，我们设计了提示感知自适应策略：从指令中提取语义基元，推断编辑意图，并在整个采样过程中动态调整奖励权重与步长。在多项图像编辑与组合生成基准测试中，RewardFlow在编辑保真度与组合对齐方面均达到最先进水平。

English

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.