GPT-IMAGE-EDIT-1.5M:百万规模GPT生成图像数据集
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
July 28, 2025
作者: Yuhan Wang, Siwei Yang, Bingchen Zhao, Letian Zhang, Qing Liu, Yuyin Zhou, Cihang Xie
cs.AI
摘要
近期,以GPT-4o为代表的大型多模态模型在指令引导的高保真图像编辑领域树立了新的标杆。然而,这些模型及其训练数据的专有性质为开源研究设置了重大障碍。为弥合这一鸿沟,我们推出了GPT-IMAGE-EDIT-1.5M,这是一个公开的大规模图像编辑语料库,包含超过150万组高质量三元组(指令、源图像、编辑后图像)。我们系统性地构建了这一数据集,利用GPT-4o的多样化能力,统一并优化了三个流行的图像编辑数据集:OmniEdit、HQ-Edit和UltraEdit。具体而言,我们的方法包括:1)重新生成输出图像以提升视觉质量与指令对齐度;2)选择性重写提示词以增强语义清晰度。为验证数据集的有效性,我们在GPT-IMAGE-EDIT-1.5M上对先进的开源模型进行了微调。实证结果令人振奋,例如,微调后的FluxKontext在一系列综合基准测试中展现出极具竞争力的性能,包括在GEdit-EN上获得7.24分,在ImgEdit-Full上获得3.80分,在Complex-Edit上获得8.78分,显示出更强的指令遵循能力和更高的感知质量,同时保持了身份一致性。这些分数显著超越了所有先前发布的开源方法,并大幅缩小了与领先专有模型的差距。我们期望GPT-IMAGE-EDIT-1.5M的全面发布能够推动指令引导图像编辑领域的进一步开放研究。
English
Recent advancements in large multimodal models like GPT-4o have set a new
standard for high-fidelity, instruction-guided image editing. However, the
proprietary nature of these models and their training data creates a
significant barrier for open-source research. To bridge this gap, we introduce
GPT-IMAGE-EDIT-1.5M, a publicly available, large-scale image-editing corpus
containing more than 1.5 million high-quality triplets (instruction, source
image, edited image). We systematically construct this dataset by leveraging
the versatile capabilities of GPT-4o to unify and refine three popular
image-editing datasets: OmniEdit, HQ-Edit, and UltraEdit. Specifically, our
methodology involves 1) regenerating output images to enhance visual quality
and instruction alignment, and 2) selectively rewriting prompts to improve
semantic clarity. To validate the efficacy of our dataset, we fine-tune
advanced open-source models on GPT-IMAGE-EDIT-1.5M. The empirical results are
exciting, e.g., the fine-tuned FluxKontext achieves highly competitive
performance across a comprehensive suite of benchmarks, including 7.24 on
GEdit-EN, 3.80 on ImgEdit-Full, and 8.78 on Complex-Edit, showing stronger
instruction following and higher perceptual quality while maintaining identity.
These scores markedly exceed all previously published open-source methods and
substantially narrow the gap to leading proprietary models. We hope the full
release of GPT-IMAGE-EDIT-1.5M can help to catalyze further open research in
instruction-guided image editing.