ChatPaper.aiChatPaper

Step1X-Edit:通用图像编辑的实用框架

Step1X-Edit: A Practical Framework for General Image Editing

April 24, 2025
作者: Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang
cs.AI

摘要

近年来,图像编辑模型取得了显著且迅速的发展。最新发布的多模态前沿模型,如GPT-4o和Gemini2 Flash,展现了极具前景的图像编辑能力。这些模型在满足用户多样化编辑需求方面表现出色,标志着图像处理领域的一大进步。然而,开源算法与这些闭源模型之间仍存在较大差距。因此,本文旨在发布一款名为Step1X-Edit的先进图像编辑模型,其性能可与GPT-4o和Gemini2 Flash等闭源模型相媲美。具体而言,我们采用多模态大语言模型处理参考图像及用户的编辑指令,提取潜在嵌入并与扩散图像解码器结合,以生成目标图像。为训练该模型,我们构建了数据生成管道,生产高质量数据集。评估方面,我们开发了基于真实用户指令的新基准GEdit-Bench。在GEdit-Bench上的实验结果表明,Step1X-Edit大幅超越现有开源基线,并接近领先的专有模型性能,从而为图像编辑领域做出了重要贡献。
English
In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of image manipulation. However, there is still a large gap between the open-source algorithm with these closed-source models. Thus, in this paper, we aim to release a state-of-the-art image editing model, called Step1X-Edit, which can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. More specifically, we adopt the Multimodal LLM to process the reference image and the user's editing instruction. A latent embedding has been extracted and integrated with a diffusion image decoder to obtain the target image. To train the model, we build a data generation pipeline to produce a high-quality dataset. For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions. Experimental results on GEdit-Bench demonstrate that Step1X-Edit outperforms existing open-source baselines by a substantial margin and approaches the performance of leading proprietary models, thereby making significant contributions to the field of image editing.

Summary

AI-Generated Summary

PDF823April 25, 2025