HP-Edit：基于人类偏好的图像编辑后训练框架

摘要

当前常见的图像编辑任务通常采用强大的生成式扩散模型作为现实内容编辑的主流范式。与此同时，尽管扩散模型优化策略（如Diffusion-DPO）和流式生成策略优化（如Flow-GRPO）等强化学习方法进一步提升了生成质量，但如何基于人类反馈的强化学习（RLHF）有效应用于扩散式编辑仍存在研究空白——这主要源于缺乏针对多样化编辑需求的可扩展人类偏好数据集及配套框架。为此，我们提出HP-Edit这一面向人类偏好对齐编辑的后训练框架，并发布包含8类常见任务、兼顾通用物体编辑平衡性的RealPref-50K真实场景数据集。具体而言，HP-Edit利用少量人工偏好评分数据与预训练视觉大语言模型（VLM），构建出HP-Scorer自动评估器来实现人类偏好对齐。该评估器既可高效构建大规模偏好数据集，又能作为奖励函数用于编辑模型的后训练。我们还建立了RealPref-Bench真实场景编辑评估基准。大量实验表明，该方法能显著提升Qwen-Image-Edit-2509等模型的性能，使其输出更贴合人类偏好。

English

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.

HP-Edit：基于人类偏好的图像编辑后训练框架

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

摘要

Support