RewardFlow: 보상을 최적화하여 이미지 생성하기

초록

RewardFlow는 다중 보상 랑주뱅 역학을 통해 추론 시점에 사전 학습된 디퓨전 및 플로우 매칭 모델을 제어하는 역전 기법이 없는 프레임워크입니다. RewardFlow는 의미 정합성, 지각적 충실도, 지역적 기반화, 객체 일관성, 인간 선호도 등 상호 보완적인 미분 가능 보상을 통합하며, 더 나아가 언어-시각 추론을 통한 정교한 의미론적 지도를 제공하는 미분 가능 VQA 기반 보상을 추가로 도입합니다. 이러한 이질적 목표들을 조율하기 위해, 우리는 지시문에서 의미 기본 요소를 추출하고 편집 의도를 추론하며 샘플링 전 과정에 걸쳐 보상 가중치와 스텝 크기를 동적으로 조절하는 프롬프트 인식 적응형 정책을 설계했습니다. 여러 이미지 편집 및 조합형 생성 벤치마크에서 RewardFlow는 최첨단 편집 충실도와 조합 정합성을 제공합니다.

English

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

RewardFlow: 보상을 최적화하여 이미지 생성하기

RewardFlow: Generate Images by Optimizing What You Reward

초록

Support