RewardFlow: 報酬を最適化して画像を生成する

要旨

RewardFlowを紹介する。これは、事前学習済み拡散モデルおよびフローマッチングモデルに対して、推論時に多報酬ランジュバン動力学を通じて制御を行う逆変換不要のフレームワークである。RewardFlowは、意味的整合性、知覚的忠実度、局所的な接地、オブジェクト一貫性、人間の嗜好といった相補的な微分可能報酬を統合し、さらに微分可能なVQAベースの報酬を導入して、言語-視覚推論によるきめ細かい意味的監督を提供する。これらの異種目的を調整するため、我々はプロンプトを意識した適応的ポリシーを設計する。これは指示から意味的プリミティブを抽出し、編集意図を推論し、サンプリング過程全体で報酬重みとステップサイズを動的に調整する。様々な画像編集および合成的生成ベンチマークにおいて、RewardFlowは最先端の編集忠実度と合成的整合性を実現する。

English

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

RewardFlow: 報酬を最適化して画像を生成する

RewardFlow: Generate Images by Optimizing What You Reward

要旨

Support