MultiEdit：推动基于指令的图像编辑在多样化与高难度任务中的发展

摘要

当前基于指令的图像编辑（IBIE）方法在处理复杂编辑任务时面临挑战，这主要源于现有数据集的编辑类型和样本数量均较为有限。此外，传统数据集构建过程中常包含噪声图像-描述对，这些噪声可能引入偏见，限制模型在复杂编辑场景下的能力。为克服这些局限，我们推出了MultiEdit，一个包含超过107,000个高质量图像编辑样本的综合数据集。该数据集通过18种非风格迁移编辑类型和38种风格迁移操作，涵盖了6项具有挑战性的编辑任务，从精细的风格迁移到复杂语义操作，如人物参照编辑和图像内文本编辑，均有所涉及。我们采用了一种新颖的数据集构建流程，利用两个多模态大语言模型（MLLMs）分别生成视觉适应性编辑指令并制作高保真编辑图像。大量实验表明，使用我们的MultiEdit-Train集对基础开源模型进行微调，显著提升了模型在我们提出的MultiEdit-Test基准测试中处理复杂编辑任务的性能，同时有效保持了其在标准编辑基准上的能力。我们相信，MultiEdit为推进更广泛、更具挑战性的IBIE能力研究提供了宝贵资源。我们的数据集已发布于https://huggingface.co/datasets/inclusionAI/MultiEdit。

English

Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in complex editing scenarios. To address these limitations, we introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality image editing samples. It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations, covering a spectrum from sophisticated style transfer to complex semantic operations like person reference editing and in-image text editing. We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions and produce high-fidelity edited images, respectively. Extensive experiments demonstrate that fine-tuning foundational open-source models with our MultiEdit-Train set substantially improves models' performance on sophisticated editing tasks in our proposed MultiEdit-Test benchmark, while effectively preserving their capabilities on the standard editing benchmark. We believe MultiEdit provides a valuable resource for advancing research into more diverse and challenging IBIE capabilities. Our dataset is available at https://huggingface.co/datasets/inclusionAI/MultiEdit.

MultiEdit：推动基于指令的图像编辑在多样化与高难度任务中的发展

MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks

摘要

Support