GPT-IMAGE-EDIT-1.5M: GPT 생성 이미지 데이터셋 백만 규모

초록

최근 GPT-4o와 같은 대규모 멀티모달 모델의 발전은 고품질의 지시 기반 이미지 편집에 새로운 기준을 제시했습니다. 그러나 이러한 모델과 그 학습 데이터의 독점적 성격은 오픈소스 연구에 상당한 장벽으로 작용합니다. 이러한 격차를 해소하기 위해, 우리는 150만 개 이상의 고품질 트리플렛(지시, 원본 이미지, 편집된 이미지)을 포함한 공개적으로 이용 가능한 대규모 이미지 편집 코퍼스인 GPT-IMAGE-EDIT-1.5M을 소개합니다. 우리는 GPT-4o의 다재다능한 능력을 활용하여 OmniEdit, HQ-Edit, UltraEdit라는 세 가지 인기 있는 이미지 편집 데이터셋을 통합하고 개선함으로써 이 데이터셋을 체계적으로 구축했습니다. 구체적으로, 우리의 방법론은 1) 시각적 품질과 지시 정렬을 강화하기 위해 출력 이미지를 재생성하고, 2) 의미론적 명확성을 개선하기 위해 프롬프트를 선택적으로 재작성하는 것을 포함합니다. 우리 데이터셋의 효능을 검증하기 위해, GPT-IMAGE-EDIT-1.5M에서 고급 오픈소스 모델을 미세 조정했습니다. 실험 결과는 매우 고무적입니다. 예를 들어, 미세 조정된 FluxKontext는 GEdit-EN에서 7.24, ImgEdit-Full에서 3.80, Complex-Edit에서 8.78 등 포괄적인 벤치마크에서 매우 경쟁력 있는 성능을 달성하며, 더 강력한 지시 수행과 더 높은 지각 품질을 유지하면서도 정체성을 유지했습니다. 이러한 점수는 이전에 발표된 모든 오픈소스 방법을 크게 능가하며, 선도적인 독점 모델과의 격차를 상당히 좁혔습니다. 우리는 GPT-IMAGE-EDIT-1.5M의 완전한 공개가 지시 기반 이미지 편영 분야에서 더 많은 오픈 연구를 촉진하는 데 도움이 되기를 바랍니다.

English

Recent advancements in large multimodal models like GPT-4o have set a new standard for high-fidelity, instruction-guided image editing. However, the proprietary nature of these models and their training data creates a significant barrier for open-source research. To bridge this gap, we introduce GPT-IMAGE-EDIT-1.5M, a publicly available, large-scale image-editing corpus containing more than 1.5 million high-quality triplets (instruction, source image, edited image). We systematically construct this dataset by leveraging the versatile capabilities of GPT-4o to unify and refine three popular image-editing datasets: OmniEdit, HQ-Edit, and UltraEdit. Specifically, our methodology involves 1) regenerating output images to enhance visual quality and instruction alignment, and 2) selectively rewriting prompts to improve semantic clarity. To validate the efficacy of our dataset, we fine-tune advanced open-source models on GPT-IMAGE-EDIT-1.5M. The empirical results are exciting, e.g., the fine-tuned FluxKontext achieves highly competitive performance across a comprehensive suite of benchmarks, including 7.24 on GEdit-EN, 3.80 on ImgEdit-Full, and 8.78 on Complex-Edit, showing stronger instruction following and higher perceptual quality while maintaining identity. These scores markedly exceed all previously published open-source methods and substantially narrow the gap to leading proprietary models. We hope the full release of GPT-IMAGE-EDIT-1.5M can help to catalyze further open research in instruction-guided image editing.

GPT-IMAGE-EDIT-1.5M: GPT 생성 이미지 데이터셋 백만 규모

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

초록

Support