HQ-Edit:一种用于基于指令的图像编辑的高质量数据集
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
April 15, 2024
作者: Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie
cs.AI
摘要
本研究介绍了HQ-Edit,一个包含约200,000次编辑的高质量基于指令的图像编辑数据集。与先前依赖属性指导或人类反馈构建数据集的方法不同,我们设计了一个可扩展的数据收集流程,利用先进的基础模型,即GPT-4V和DALL-E 3。为确保其高质量,首先在线收集多样化的示例,扩展后,然后用于创建包含输入和输出图像以及详细文本提示的高质量双联图,通过后处理确保精确对齐。此外,我们提出了两个评估指标,即对齐度和连贯性,以定量评估使用GPT-4V的图像编辑对的质量。HQ-Edit的高分辨率图像富含细节,并配有全面的编辑提示,显著增强了现有图像编辑模型的能力。例如,经过微调的InstructPix2Pix可以实现最先进的图像编辑性能,甚至超过那些使用人类注释数据微调的模型。项目页面链接为https://thefllood.github.io/HQEdit_web。
English
This study introduces HQ-Edit, a high-quality instruction-based image editing
dataset with around 200,000 edits. Unlike prior approaches relying on attribute
guidance or human feedback on building datasets, we devise a scalable data
collection pipeline leveraging advanced foundation models, namely GPT-4V and
DALL-E 3. To ensure its high quality, diverse examples are first collected
online, expanded, and then used to create high-quality diptychs featuring input
and output images with detailed text prompts, followed by precise alignment
ensured through post-processing. In addition, we propose two evaluation
metrics, Alignment and Coherence, to quantitatively assess the quality of image
edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and
accompanied by comprehensive editing prompts, substantially enhance the
capabilities of existing image editing models. For example, an HQ-Edit
finetuned InstructPix2Pix can attain state-of-the-art image editing
performance, even surpassing those models fine-tuned with human-annotated data.
The project page is https://thefllood.github.io/HQEdit_web.Summary
AI-Generated Summary