ChatPaper.aiChatPaper

UltraEdit:基於指令的大規模精細圖像編輯

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

July 7, 2024
作者: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang
cs.AI

摘要

本文介紹了UltraEdit,一個大規模(約4百萬編輯樣本)的自動生成教學指導圖像編輯數據集。我們的關鍵想法是解決現有圖像編輯數據集(如InstructPix2Pix和MagicBrush)的缺點,並提供一種系統性方法來生成大量且高質量的圖像編輯樣本。UltraEdit具有幾個明顯優勢:1)通過利用大型語言模型(LLMs)的創造力以及來自人類評定者的上下文編輯示例,它具有更廣泛的編輯指導範圍;2)其數據來源基於真實圖像,包括照片和藝術品,相較於僅由文本到圖像模型生成的數據集,提供更大的多樣性和減少偏見;3)它還支持基於區域的編輯,並由高質量的自動生成區域標註進行增強。我們的實驗表明,在UltraEdit上訓練的基於擴散的編輯基準線在MagicBrush和Emu-Edit基準上創下了新紀錄。我們的分析進一步確認了真實圖像錨點和基於區域的編輯數據的關鍵作用。數據集、代碼和模型可在https://ultra-editing.github.io找到。
English
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct advantages: 1) It features a broader range of editing instructions by leveraging the creativity of large language models (LLMs) alongside in-context editing examples from human raters; 2) Its data sources are based on real images, including photographs and artworks, which provide greater diversity and reduced bias compared to datasets solely generated by text-to-image models; 3) It also supports region-based editing, enhanced by high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models can be found in https://ultra-editing.github.io.

Summary

AI-Generated Summary

PDF151November 28, 2024