ChatPaper.aiChatPaper

GPT-IMAGE-EDIT-1.5M:百萬規模的GPT生成圖像數據集

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

July 28, 2025
作者: Yuhan Wang, Siwei Yang, Bingchen Zhao, Letian Zhang, Qing Liu, Yuyin Zhou, Cihang Xie
cs.AI

摘要

近期,如GPT-4o等大型多模态模型的進展,為高保真度、指令引導的圖像編輯設立了新標準。然而,這些模型及其訓練數據的專有性質,對開源研究構成了重大障礙。為彌合這一差距,我們推出了GPT-IMAGE-EDIT-1.5M,這是一個公開可用的大規模圖像編輯語料庫,包含超過150萬個高質量三元組(指令、源圖像、編輯後圖像)。我們系統地構建了這一數據集,利用GPT-4o的多功能能力,統一並精煉了三個流行的圖像編輯數據集:OmniEdit、HQ-Edit和UltraEdit。具體而言,我們的方法包括:1)重新生成輸出圖像以提升視覺質量與指令對齊度,2)選擇性重寫提示以增強語義清晰度。為驗證我們數據集的有效性,我們在GPT-IMAGE-EDIT-1.5M上對先進的開源模型進行了微調。實證結果令人振奮,例如,微調後的FluxKontext在一系列綜合基準測試中表現出極具競爭力的性能,包括在GEdit-EN上獲得7.24分,在ImgEdit-Full上獲得3.80分,在Complex-Edit上獲得8.78分,顯示出更強的指令遵循能力和更高的感知質量,同時保持了身份一致性。這些分數顯著超越了所有先前發布的開源方法,並大幅縮小了與領先專有模型之間的差距。我們希望GPT-IMAGE-EDIT-1.5M的全面發布,能夠促進指令引導圖像編輯領域的進一步開放研究。
English
Recent advancements in large multimodal models like GPT-4o have set a new standard for high-fidelity, instruction-guided image editing. However, the proprietary nature of these models and their training data creates a significant barrier for open-source research. To bridge this gap, we introduce GPT-IMAGE-EDIT-1.5M, a publicly available, large-scale image-editing corpus containing more than 1.5 million high-quality triplets (instruction, source image, edited image). We systematically construct this dataset by leveraging the versatile capabilities of GPT-4o to unify and refine three popular image-editing datasets: OmniEdit, HQ-Edit, and UltraEdit. Specifically, our methodology involves 1) regenerating output images to enhance visual quality and instruction alignment, and 2) selectively rewriting prompts to improve semantic clarity. To validate the efficacy of our dataset, we fine-tune advanced open-source models on GPT-IMAGE-EDIT-1.5M. The empirical results are exciting, e.g., the fine-tuned FluxKontext achieves highly competitive performance across a comprehensive suite of benchmarks, including 7.24 on GEdit-EN, 3.80 on ImgEdit-Full, and 8.78 on Complex-Edit, showing stronger instruction following and higher perceptual quality while maintaining identity. These scores markedly exceed all previously published open-source methods and substantially narrow the gap to leading proprietary models. We hope the full release of GPT-IMAGE-EDIT-1.5M can help to catalyze further open research in instruction-guided image editing.
PDF152July 29, 2025