ChatPaper.aiChatPaper

HQ-Edit:一個用於基於指示的圖像編輯的高質量數據集

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

April 15, 2024
作者: Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie
cs.AI

摘要

本研究介紹了HQ-Edit,一個包含約200,000個編輯的高質量指令型圖像編輯數據集。與先前依賴屬性指導或人類反饋建立數據集的方法不同,我們設計了一個可擴展的數據收集流程,利用先進的基礎模型,即GPT-4V和DALL-E 3。為確保其高質量,首先在線收集多樣化的示例,擴展後,用於創建包含詳細文本提示的輸入和輸出圖像的高質量雙聯圖,通過後處理確保精確對齊。此外,我們提出了兩個評估指標,即對齊和連貫性,以定量評估使用GPT-4V的圖像編輯對的質量。HQ-Edit的高分辨率圖像豐富細節,並配有全面的編輯提示,顯著增強了現有圖像編輯模型的能力。例如,經過微調的InstructPix2Pix可以實現最先進的圖像編輯性能,甚至超越了那些使用人工標註數據進行微調的模型。項目頁面為https://thefllood.github.io/HQEdit_web。
English
This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data. The project page is https://thefllood.github.io/HQEdit_web.

Summary

AI-Generated Summary

PDF130December 15, 2024