HP-Edit：基于人类偏好的图像编辑后训练框架

摘要

当前主流图像编辑任务普遍采用强大的生成式扩散模型作为真实场景内容编辑的主导范式。与此同时，尽管扩散-DPO、Flow-GRPO等强化学习方法进一步提升了生成质量，但由于缺乏可扩展的人类偏好数据集及适应多样化编辑需求的框架，如何将基于人类反馈的强化学习高效应用于扩散模型编辑领域仍待探索。为填补这一空白，我们提出HP-Edit——一种面向人类偏好对齐编辑的后训练框架，并发布RealPref-50K真实场景数据集，涵盖八大常见任务并兼顾通用对象编辑的平衡性。具体而言，HP-Edit利用少量人工偏好评分数据与预训练视觉大语言模型，开发出自动化的偏好对齐评估器HP-Scorer。该评估器既能高效构建可扩展的偏好数据集，又可作为奖励函数用于编辑模型的后训练。我们还建立了RealPref-Bench基准测试体系，用于评估真实场景下的编辑性能。大量实验表明，我们的方法显著提升了如Qwen-Image-Edit-2509等模型的性能，使其输出结果与人类偏好更紧密对齐。

English

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.

HP-Edit：基于人类偏好的图像编辑后训练框架

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

摘要

Support