ImgEdit：统一的图像编辑数据集与基准平台

摘要

近期生成模型的进展已实现了高保真度的文本到图像生成。然而，开源图像编辑模型仍落后于其专有版本，主要原因是高质量数据的匮乏和基准测试的不足。为突破这些限制，我们推出了ImgEdit，一个大规模、高质量的图像编辑数据集，包含120万对精心筛选的编辑对，这些编辑对不仅涵盖了新颖且复杂的单轮编辑，还包括具有挑战性的多轮任务。为确保数据质量，我们采用了一个多阶段处理流程，整合了尖端的视觉语言模型、检测模型、分割模型，以及针对特定任务的图像修复程序和严格的后处理步骤。ImgEdit在任务新颖性和数据质量上均超越了现有数据集。利用ImgEdit，我们训练了ImgEdit-E1，这是一个利用视觉语言模型处理参考图像和编辑提示的编辑模型，它在多项任务上超越了现有的开源模型，彰显了ImgEdit数据集及模型设计的价值。为了进行全面评估，我们引入了ImgEdit-Bench，这是一个旨在从指令遵循、编辑质量和细节保留三个方面评估图像编辑性能的基准测试。它包括基础测试集、挑战性的单轮测试集和专门的多轮测试集。我们对开源与专有模型以及ImgEdit-E1进行了评估，提供了对当前图像编辑模型行为的深入分析和可操作的见解。所有源数据已公开于https://github.com/PKU-YuanGroup/ImgEdit。

English

Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.

ImgEdit：统一的图像编辑数据集与基准平台

ImgEdit: A Unified Image Editing Dataset and Benchmark

摘要

Support