ImgEdit:统一的图像编辑数据集与基准平台
ImgEdit: A Unified Image Editing Dataset and Benchmark
May 26, 2025
作者: Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan
cs.AI
摘要
近期生成模型的进展已实现了高保真度的文本到图像生成。然而,开源图像编辑模型仍落后于其专有版本,主要原因是高质量数据的匮乏和基准测试的不足。为突破这些限制,我们推出了ImgEdit,一个大规模、高质量的图像编辑数据集,包含120万对精心筛选的编辑对,这些编辑对不仅涵盖了新颖且复杂的单轮编辑,还包括具有挑战性的多轮任务。为确保数据质量,我们采用了一个多阶段处理流程,整合了尖端的视觉语言模型、检测模型、分割模型,以及针对特定任务的图像修复程序和严格的后处理步骤。ImgEdit在任务新颖性和数据质量上均超越了现有数据集。利用ImgEdit,我们训练了ImgEdit-E1,这是一个利用视觉语言模型处理参考图像和编辑提示的编辑模型,它在多项任务上超越了现有的开源模型,彰显了ImgEdit数据集及模型设计的价值。为了进行全面评估,我们引入了ImgEdit-Bench,这是一个旨在从指令遵循、编辑质量和细节保留三个方面评估图像编辑性能的基准测试。它包括基础测试集、挑战性的单轮测试集和专门的多轮测试集。我们对开源与专有模型以及ImgEdit-E1进行了评估,提供了对当前图像编辑模型行为的深入分析和可操作的见解。所有源数据已公开于https://github.com/PKU-YuanGroup/ImgEdit。
English
Recent advancements in generative models have enabled high-fidelity
text-to-image generation. However, open-source image-editing models still lag
behind their proprietary counterparts, primarily due to limited high-quality
data and insufficient benchmarks. To overcome these limitations, we introduce
ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2
million carefully curated edit pairs, which contain both novel and complex
single-turn edits, as well as challenging multi-turn tasks. To ensure the data
quality, we employ a multi-stage pipeline that integrates a cutting-edge
vision-language model, a detection model, a segmentation model, alongside
task-specific in-painting procedures and strict post-processing. ImgEdit
surpasses existing datasets in both task novelty and data quality. Using
ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to
process the reference image and editing prompt, which outperforms existing
open-source models on multiple tasks, highlighting the value of ImgEdit and
model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a
benchmark designed to evaluate image editing performance in terms of
instruction adherence, editing quality, and detail preservation. It includes a
basic testsuite, a challenging single-turn suite, and a dedicated multi-turn
suite. We evaluate both open-source and proprietary models, as well as
ImgEdit-E1, providing deep analysis and actionable insights into the current
behavior of image-editing models. The source data are publicly available on
https://github.com/PKU-YuanGroup/ImgEdit.Summary
AI-Generated Summary