ChatPaper.aiChatPaper

ImgEdit:統一圖像編輯數據集與基準測試平台

ImgEdit: A Unified Image Editing Dataset and Benchmark

May 26, 2025
作者: Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan
cs.AI

摘要

近期生成模型的進展已實現了高保真度的文本到圖像生成。然而,開源圖像編輯模型仍落後於其專有版本,主要受限於高質量數據的缺乏及基準測試的不足。為克服這些限制,我們推出了ImgEdit,這是一個大規模、高質量的圖像編輯數據集,包含120萬對精心策劃的編輯配對,涵蓋新穎且複雜的單次編輯,以及具有挑戰性的多輪任務。為確保數據質量,我們採用了一個多階段處理流程,整合了前沿的視覺語言模型、檢測模型、分割模型,以及針對特定任務的圖像修復程序和嚴格的後處理步驟。ImgEdit在任務新穎性和數據質量上均超越了現有數據集。利用ImgEdit,我們訓練了ImgEdit-E1,這是一個使用視覺語言模型處理參考圖像和編輯提示的編輯模型,在多項任務上表現優於現有的開源模型,彰顯了ImgEdit的價值及模型設計的優勢。為全面評估,我們引入了ImgEdit-Bench,這是一個旨在從指令遵循、編輯質量和細節保留三個維度評估圖像編輯性能的基準測試。它包括基礎測試集、挑戰性的單輪測試集和專門的多輪測試集。我們對開源和專有模型以及ImgEdit-E1進行了評估,提供了對當前圖像編輯模型行為的深入分析和可操作的見解。源數據已公開於https://github.com/PKU-YuanGroup/ImgEdit。
English
Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.

Summary

AI-Generated Summary

PDF173May 28, 2025