ImgEdit:統一圖像編輯數據集與基準測試平台
ImgEdit: A Unified Image Editing Dataset and Benchmark
May 26, 2025
作者: Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan
cs.AI
摘要
近期生成模型的進展已實現了高保真度的文本到圖像生成。然而,開源圖像編輯模型仍落後於其專有版本,主要受限於高質量數據的缺乏及基準測試的不足。為克服這些限制,我們推出了ImgEdit,這是一個大規模、高質量的圖像編輯數據集,包含120萬對精心策劃的編輯配對,涵蓋新穎且複雜的單次編輯,以及具有挑戰性的多輪任務。為確保數據質量,我們採用了一個多階段處理流程,整合了前沿的視覺語言模型、檢測模型、分割模型,以及針對特定任務的圖像修復程序和嚴格的後處理步驟。ImgEdit在任務新穎性和數據質量上均超越了現有數據集。利用ImgEdit,我們訓練了ImgEdit-E1,這是一個使用視覺語言模型處理參考圖像和編輯提示的編輯模型,在多項任務上表現優於現有的開源模型,彰顯了ImgEdit的價值及模型設計的優勢。為全面評估,我們引入了ImgEdit-Bench,這是一個旨在從指令遵循、編輯質量和細節保留三個維度評估圖像編輯性能的基準測試。它包括基礎測試集、挑戰性的單輪測試集和專門的多輪測試集。我們對開源和專有模型以及ImgEdit-E1進行了評估,提供了對當前圖像編輯模型行為的深入分析和可操作的見解。源數據已公開於https://github.com/PKU-YuanGroup/ImgEdit。
English
Recent advancements in generative models have enabled high-fidelity
text-to-image generation. However, open-source image-editing models still lag
behind their proprietary counterparts, primarily due to limited high-quality
data and insufficient benchmarks. To overcome these limitations, we introduce
ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2
million carefully curated edit pairs, which contain both novel and complex
single-turn edits, as well as challenging multi-turn tasks. To ensure the data
quality, we employ a multi-stage pipeline that integrates a cutting-edge
vision-language model, a detection model, a segmentation model, alongside
task-specific in-painting procedures and strict post-processing. ImgEdit
surpasses existing datasets in both task novelty and data quality. Using
ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to
process the reference image and editing prompt, which outperforms existing
open-source models on multiple tasks, highlighting the value of ImgEdit and
model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a
benchmark designed to evaluate image editing performance in terms of
instruction adherence, editing quality, and detail preservation. It includes a
basic testsuite, a challenging single-turn suite, and a dedicated multi-turn
suite. We evaluate both open-source and proprietary models, as well as
ImgEdit-E1, providing deep analysis and actionable insights into the current
behavior of image-editing models. The source data are publicly available on
https://github.com/PKU-YuanGroup/ImgEdit.Summary
AI-Generated Summary