DragonDiffusion:在扩散模型上实现拖拽式操作
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
July 5, 2023
作者: Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang
cs.AI
摘要
尽管现有的大规模文本到图像(T2I)模型能够从详细的文本描述中生成高质量图像,但它们通常缺乏精确编辑生成或真实图像的能力。在本文中,我们提出了一种新颖的图像编辑方法DragonDiffusion,实现了Drag风格的编辑在Diffusion模型上的操作。具体而言,我们基于扩散模型中中间特征的强对应关系构建了分类器引导。它可以通过特征对应损失将编辑信号转换为梯度,以修改扩散模型的中间表示。基于这种引导策略,我们还构建了多尺度引导,考虑了语义和几何对齐。此外,我们添加了跨分支自注意力机制,以保持原始图像与编辑结果之间的一致性。通过高效设计,我们的方法实现了对生成或真实图像的各种编辑模式,如物体移动、物体调整大小、物体外观替换和内容拖拽。值得注意的是,所有编辑和内容保留信号均来自图像本身,模型不需要微调或额外模块。我们的源代码将在https://github.com/MC-E/DragonDiffusion 上提供。
English
Despite the ability of existing large-scale text-to-image (T2I) models to
generate high-quality images from detailed textual descriptions, they often
lack the ability to precisely edit the generated or real images. In this paper,
we propose a novel image editing method, DragonDiffusion, enabling Drag-style
manipulation on Diffusion models. Specifically, we construct classifier
guidance based on the strong correspondence of intermediate features in the
diffusion model. It can transform the editing signals into gradients via
feature correspondence loss to modify the intermediate representation of the
diffusion model. Based on this guidance strategy, we also build a multi-scale
guidance to consider both semantic and geometric alignment. Moreover, a
cross-branch self-attention is added to maintain the consistency between the
original image and the editing result. Our method, through an efficient design,
achieves various editing modes for the generated or real images, such as object
moving, object resizing, object appearance replacement, and content dragging.
It is worth noting that all editing and content preservation signals come from
the image itself, and the model does not require fine-tuning or additional
modules. Our source code will be available at
https://github.com/MC-E/DragonDiffusion.