MultiEdit:在多樣化與挑戰性任務中推進基於指令的圖像編輯
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
September 18, 2025
作者: Mingsong Li, Lin Liu, Hongjun Wang, Haoxing Chen, Xijun Gu, Shizhan Liu, Dong Gong, Junbo Zhao, Zhenzhong Lan, Jianguo Li
cs.AI
摘要
現有的基於指令的圖像編輯(IBIE)方法在處理具有挑戰性的編輯任務時面臨困難,這主要是因為現有數據集的編輯類型和樣本數量均有限。此外,傳統的數據集構建過程常包含噪聲圖像-標註對,這可能引入偏差並限制模型在複雜編輯場景中的能力。為解決這些限制,我們引入了MultiEdit,這是一個包含超過107K高質量圖像編輯樣本的綜合數據集。它通過多樣化的18種非風格轉換編輯類型和38種風格轉換操作,涵蓋了6種具有挑戰性的編輯任務,從精細的風格轉換到複雜的語義操作,如人物參考編輯和圖像內文本編輯。我們採用了一種新穎的數據集構建流程,利用兩個多模態大語言模型(MLLMs)分別生成視覺適應性編輯指令並生成高保真度的編輯圖像。大量實驗表明,使用我們的MultiEdit-Train集對基礎開源模型進行微調,顯著提升了模型在我們提出的MultiEdit-Test基準上處理複雜編輯任務的性能,同時有效保留了其在標準編輯基準上的能力。我們相信MultiEdit為推進更為多樣化和具有挑戰性的IBIE能力研究提供了寶貴資源。我們的數據集可在https://huggingface.co/datasets/inclusionAI/MultiEdit獲取。
English
Current instruction-based image editing (IBIE) methods struggle with
challenging editing tasks, as both editing types and sample counts of existing
datasets are limited. Moreover, traditional dataset construction often contains
noisy image-caption pairs, which may introduce biases and limit model
capabilities in complex editing scenarios. To address these limitations, we
introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality
image editing samples. It encompasses 6 challenging editing tasks through a
diverse collection of 18 non-style-transfer editing types and 38 style transfer
operations, covering a spectrum from sophisticated style transfer to complex
semantic operations like person reference editing and in-image text editing. We
employ a novel dataset construction pipeline that utilizes two multi-modal
large language models (MLLMs) to generate visual-adaptive editing instructions
and produce high-fidelity edited images, respectively. Extensive experiments
demonstrate that fine-tuning foundational open-source models with our
MultiEdit-Train set substantially improves models' performance on sophisticated
editing tasks in our proposed MultiEdit-Test benchmark, while effectively
preserving their capabilities on the standard editing benchmark. We believe
MultiEdit provides a valuable resource for advancing research into more diverse
and challenging IBIE capabilities. Our dataset is available at
https://huggingface.co/datasets/inclusionAI/MultiEdit.