MagicBrush：一个手动标注的数据集，用于指导图像编辑。

摘要

在日常生活中，文本引导的图像编辑被广泛需要，从个人用途到诸如Photoshop之类的专业应用。然而，现有方法要么是零次学习，要么是在自动合成的数据集上训练，其中包含大量噪音。因此，在实践中仍需要大量手动调整才能产生理想的结果。为了解决这个问题，我们引入了MagicBrush（https://osu-nlp-group.github.io/MagicBrush/），这是第一个大规模手动注释的数据集，用于指导真实图像编辑，涵盖了各种场景：单轮，多轮，提供蒙版和无蒙版编辑。MagicBrush包含超过10K个手动注释的三元组（源图像，指令，目标图像），支持训练大规模文本引导的图像编辑模型。我们在MagicBrush上对InstructPix2Pix进行微调，并展示新模型可以根据人类评估产生更好的图像。我们进一步进行了广泛的实验，评估当前图像编辑基线从多个维度，包括定量，定性和人类评估。结果揭示了我们数据集的挑战性质以及当前基线与实际编辑需求之间的差距。

English

Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triples (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.

MagicBrush：一个手动标注的数据集，用于指导图像编辑。

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

摘要

Support