ImgEdit: 통합 이미지 편집 데이터셋 및 벤치마크

초록

최근 생성 모델의 발전으로 고품질의 텍스트-이미지 생성이 가능해졌습니다. 그러나 오픈소스 이미지 편집 모델은 여전히 상용 모델에 비해 뒤처져 있는데, 이는 주로 고품질 데이터의 부족과 불충분한 벤치마크 때문입니다. 이러한 한계를 극복하기 위해, 우리는 120만 개의 신중하게 선별된 편집 쌍으로 구성된 대규모 고품질 이미지 편집 데이터셋인 ImgEdit을 소개합니다. 이 데이터셋은 새롭고 복잡한 단일 단계 편집뿐만 아니라 도전적인 다단계 작업도 포함하고 있습니다. 데이터 품질을 보장하기 위해, 우리는 최첨단 시각-언어 모델, 탐지 모델, 분할 모델을 통합한 다단계 파이프라인을 사용하며, 작업별 인페인팅 절차와 엄격한 후처리 과정을 거칩니다. ImgEdit은 기존 데이터셋을 작업의 참신성과 데이터 품질 모두에서 능가합니다. ImgEdit을 사용하여, 우리는 참조 이미지와 편집 프롬프트를 처리하기 위해 시각 언어 모델을 사용하는 편집 모델인 ImgEdit-E1을 학습시켰으며, 이는 여러 작업에서 기존 오픈소스 모델을 능가하여 ImgEdit과 모델 설계의 가치를 입증했습니다. 포괄적인 평가를 위해, 우리는 지시 사항 준수, 편집 품질, 세부 사항 보존 측면에서 이미지 편집 성능을 평가하기 위해 설계된 벤치마크인 ImgEdit-Bench를 소개합니다. 이 벤치마크는 기본 테스트 스위트, 도전적인 단일 단계 스위트, 전용 다단계 스위트를 포함합니다. 우리는 오픈소스 및 상용 모델뿐만 아니라 ImgEdit-E1도 평가하여, 현재 이미지 편집 모델의 동작에 대한 심층 분석과 실행 가능한 통찰을 제공합니다. 소스 데이터는 https://github.com/PKU-YuanGroup/ImgEdit에서 공개적으로 이용 가능합니다.

English

Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.

ImgEdit: 통합 이미지 편집 데이터셋 및 벤치마크

ImgEdit: A Unified Image Editing Dataset and Benchmark

초록

Support