RewardFlow：通过优化奖励机制生成图像

摘要

我们提出RewardFlow——一种无需逆向计算的全新框架，通过多奖励朗之万动力学在推理阶段引导预训练的扩散模型与流匹配模型。该框架统一了语义对齐、感知保真度、局部定位、物体一致性和人类偏好等互补性可微奖励机制，并创新性地引入基于可微视觉问答的奖励函数，通过语言-视觉推理提供细粒度语义监督。为协调这些异构目标，我们设计了提示词感知的自适应策略：从指令中提取语义基元，推断编辑意图，并在整个采样过程中动态调整奖励权重与步长。在多项图像编辑与组合生成基准测试中，RewardFlow在编辑保真度与组合对齐度方面均达到最先进水平。

English

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

RewardFlow：通过优化奖励机制生成图像

RewardFlow: Generate Images by Optimizing What You Reward

摘要

Support