MultiEdit:推动基于指令的图像编辑在多样化与高难度任务中的发展
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
September 18, 2025
作者: Mingsong Li, Lin Liu, Hongjun Wang, Haoxing Chen, Xijun Gu, Shizhan Liu, Dong Gong, Junbo Zhao, Zhenzhong Lan, Jianguo Li
cs.AI
摘要
当前基于指令的图像编辑(IBIE)方法在处理复杂编辑任务时面临挑战,这主要源于现有数据集的编辑类型和样本数量均较为有限。此外,传统数据集构建过程中常包含噪声图像-描述对,这些噪声可能引入偏见,限制模型在复杂编辑场景下的能力。为克服这些局限,我们推出了MultiEdit,一个包含超过107,000个高质量图像编辑样本的综合数据集。该数据集通过18种非风格迁移编辑类型和38种风格迁移操作,涵盖了6项具有挑战性的编辑任务,从精细的风格迁移到复杂语义操作,如人物参照编辑和图像内文本编辑,均有所涉及。我们采用了一种新颖的数据集构建流程,利用两个多模态大语言模型(MLLMs)分别生成视觉适应性编辑指令并制作高保真编辑图像。大量实验表明,使用我们的MultiEdit-Train集对基础开源模型进行微调,显著提升了模型在我们提出的MultiEdit-Test基准测试中处理复杂编辑任务的性能,同时有效保持了其在标准编辑基准上的能力。我们相信,MultiEdit为推进更广泛、更具挑战性的IBIE能力研究提供了宝贵资源。我们的数据集已发布于https://huggingface.co/datasets/inclusionAI/MultiEdit。
English
Current instruction-based image editing (IBIE) methods struggle with
challenging editing tasks, as both editing types and sample counts of existing
datasets are limited. Moreover, traditional dataset construction often contains
noisy image-caption pairs, which may introduce biases and limit model
capabilities in complex editing scenarios. To address these limitations, we
introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality
image editing samples. It encompasses 6 challenging editing tasks through a
diverse collection of 18 non-style-transfer editing types and 38 style transfer
operations, covering a spectrum from sophisticated style transfer to complex
semantic operations like person reference editing and in-image text editing. We
employ a novel dataset construction pipeline that utilizes two multi-modal
large language models (MLLMs) to generate visual-adaptive editing instructions
and produce high-fidelity edited images, respectively. Extensive experiments
demonstrate that fine-tuning foundational open-source models with our
MultiEdit-Train set substantially improves models' performance on sophisticated
editing tasks in our proposed MultiEdit-Test benchmark, while effectively
preserving their capabilities on the standard editing benchmark. We believe
MultiEdit provides a valuable resource for advancing research into more diverse
and challenging IBIE capabilities. Our dataset is available at
https://huggingface.co/datasets/inclusionAI/MultiEdit.