UltraEdit:基于指令的大规模细粒度图像编辑
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
July 7, 2024
作者: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang
cs.AI
摘要
本文介绍了UltraEdit,这是一个大规模(约400万个编辑样本)的自动生成数据集,用于基于指令的图像编辑。我们的关键思想是解决现有图像编辑数据集(如InstructPix2Pix和MagicBrush)的缺点,并提供一个系统化方法来生成大量且高质量的图像编辑样本。UltraEdit具有几个明显优势:1)通过利用大型语言模型(LLMs)的创造力以及来自人类评分者的上下文编辑示例,它具有更广泛范围的编辑指令;2)其数据来源基于真实图像,包括照片和艺术作品,相较于仅由文本到图像模型生成的数据集,提供了更大的多样性和减少了偏见;3)它还支持基于区域的编辑,通过高质量的自动生成区域注释进行增强。我们的实验表明,在UltraEdit上训练的基于经典扩散的编辑基线在MagicBrush和Emu-Edit基准上创造了新纪录。我们的分析进一步确认了真实图像锚点和基于区域的编辑数据的关键作用。数据集、代码和模型可在https://ultra-editing.github.io找到。
English
This paper presents UltraEdit, a large-scale (approximately 4 million editing
samples), automatically generated dataset for instruction-based image editing.
Our key idea is to address the drawbacks in existing image editing datasets
like InstructPix2Pix and MagicBrush, and provide a systematic approach to
producing massive and high-quality image editing samples. UltraEdit offers
several distinct advantages: 1) It features a broader range of editing
instructions by leveraging the creativity of large language models (LLMs)
alongside in-context editing examples from human raters; 2) Its data sources
are based on real images, including photographs and artworks, which provide
greater diversity and reduced bias compared to datasets solely generated by
text-to-image models; 3) It also supports region-based editing, enhanced by
high-quality, automatically produced region annotations. Our experiments show
that canonical diffusion-based editing baselines trained on UltraEdit set new
records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms
the crucial role of real image anchors and region-based editing data. The
dataset, code, and models can be found in https://ultra-editing.github.io.Summary
AI-Generated Summary