UltraEdit：基于指令的大规模细粒度图像编辑

摘要

本文介绍了UltraEdit，这是一个大规模（约400万个编辑样本）的自动生成数据集，用于基于指令的图像编辑。我们的关键思想是解决现有图像编辑数据集（如InstructPix2Pix和MagicBrush）的缺点，并提供一个系统化方法来生成大量且高质量的图像编辑样本。UltraEdit具有几个明显优势：1）通过利用大型语言模型（LLMs）的创造力以及来自人类评分者的上下文编辑示例，它具有更广泛范围的编辑指令；2）其数据来源基于真实图像，包括照片和艺术作品，相较于仅由文本到图像模型生成的数据集，提供了更大的多样性和减少了偏见；3）它还支持基于区域的编辑，通过高质量的自动生成区域注释进行增强。我们的实验表明，在UltraEdit上训练的基于经典扩散的编辑基线在MagicBrush和Emu-Edit基准上创造了新纪录。我们的分析进一步确认了真实图像锚点和基于区域的编辑数据的关键作用。数据集、代码和模型可在https://ultra-editing.github.io找到。

English

This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct advantages: 1) It features a broader range of editing instructions by leveraging the creativity of large language models (LLMs) alongside in-context editing examples from human raters; 2) Its data sources are based on real images, including photographs and artworks, which provide greater diversity and reduced bias compared to datasets solely generated by text-to-image models; 3) It also supports region-based editing, enhanced by high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models can be found in https://ultra-editing.github.io.

UltraEdit：基于指令的大规模细粒度图像编辑

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

摘要

Support