ImgEdit: 統合型画像編集データセットとベンチマーク

要旨

近年の生成モデルの進歩により、高忠実度のテキストから画像への生成が可能となった。しかし、オープンソースの画像編集モデルは、主に高品質なデータの不足と不十分なベンチマークが原因で、プロプライエタリなモデルに遅れをとっている。これらの制限を克服するため、我々はImgEditを導入する。これは120万の厳選された編集ペアからなる大規模で高品質な画像編集データセットであり、新規かつ複雑な単一ターン編集と、挑戦的な多ターンタスクの両方を含んでいる。データ品質を確保するため、最先端の視覚言語モデル、検出モデル、セグメンテーションモデルを統合し、タスク固有のインペインティング手順と厳格な後処理を施した多段階パイプラインを採用している。ImgEditは、タスクの新規性とデータ品質の両面で既存のデータセットを凌駕している。ImgEditを使用して、我々はImgEdit-E1を訓練した。これは視覚言語モデルを使用して参照画像と編集プロンプトを処理する編集モデルであり、複数のタスクで既存のオープンソースモデルを上回り、ImgEditとモデル設計の価値を示している。包括的な評価のために、我々はImgEdit-Benchを導入する。これは、指示の遵守、編集品質、詳細の保持の観点で画像編集性能を評価するために設計されたベンチマークである。基本的なテストスイート、挑戦的な単一ターンスイート、専用の多ターンスイートを含んでいる。オープンソースとプロプライエタリのモデル、およびImgEdit-E1を評価し、画像編集モデルの現在の挙動に関する深い分析と実践的な洞察を提供する。ソースデータはhttps://github.com/PKU-YuanGroup/ImgEditで公開されている。

English

Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.

ImgEdit: 統合型画像編集データセットとベンチマーク

ImgEdit: A Unified Image Editing Dataset and Benchmark

要旨

Support