MagicBrush：一個手動標註的資料集，用於指導式圖像編輯。

摘要

在日常生活中，從個人用途到專業應用如Photoshop，都廣泛需要文本引導的圖像編輯。然而，現有方法要麼是零樣本，要麼是在自動合成的數據集上進行訓練，其中包含大量噪音。因此，在實踐中仍需要大量手動調整才能產生理想的結果。為解決這個問題，我們介紹了MagicBrush（https://osu-nlp-group.github.io/MagicBrush/），這是第一個大規模手動標註的指導實際圖像編輯數據集，涵蓋各種情境：單輪、多輪、提供遮罩和無遮罩編輯。MagicBrush包含超過10K個手動標註的三元組（源圖像、指令、目標圖像），支持訓練大規模文本引導的圖像編輯模型。我們在MagicBrush上對InstructPix2Pix進行微調，並展示新模型可以根據人類評估產生更好的圖像。我們進一步進行了廣泛的實驗，從多個維度包括定量、定性和人類評估來評估當前圖像編輯基準。結果顯示了我們數據集的挑戰性質以及當前基準和實際編輯需求之間的差距。

English

Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triples (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.

MagicBrush：一個手動標註的資料集，用於指導式圖像編輯。

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

摘要

Support