DragDiffusion:利用扩散模型进行交互式基于点的图像编辑
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
June 26, 2023
作者: Yujun Shi, Chuhui Xue, Jiachun Pan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
cs.AI
摘要
精确可控的图像编辑是一项具有挑战性的任务,吸引了相当大的关注。最近,DragGAN实现了一个交互式基于点的图像编辑框架,并以像素级精度实现了令人印象深刻的编辑结果。然而,由于这种方法基于生成对抗网络(GAN),其通用性受到预训练GAN模型容量的上限约束。在这项工作中,我们将这样的编辑框架扩展到扩散模型,并提出了DragDiffusion。通过利用大规模预训练的扩散模型,我们极大地提高了交互式基于点的编辑在现实场景中的适用性。虽然大多数现有基于扩散的图像编辑方法是基于文本嵌入的,DragDiffusion优化了扩散潜在以实现精确的空间控制。尽管扩散模型以迭代方式生成图像,但我们凭经验表明,在一个单一步骤中优化扩散潜在就足以生成连贯的结果,使DragDiffusion能够高效完成高质量的编辑。在广泛的具有挑战性的案例(如多对象、不同对象类别、各种风格等)上进行的大量实验展示了DragDiffusion的多功能性和通用性。
English
Precise and controllable image editing is a challenging task that has
attracted significant attention. Recently, DragGAN enables an interactive
point-based image editing framework and achieves impressive editing results
with pixel-level precision. However, since this method is based on generative
adversarial networks (GAN), its generality is upper-bounded by the capacity of
the pre-trained GAN models. In this work, we extend such an editing framework
to diffusion models and propose DragDiffusion. By leveraging large-scale
pretrained diffusion models, we greatly improve the applicability of
interactive point-based editing in real world scenarios. While most existing
diffusion-based image editing methods work on text embeddings, DragDiffusion
optimizes the diffusion latent to achieve precise spatial control. Although
diffusion models generate images in an iterative manner, we empirically show
that optimizing diffusion latent at one single step suffices to generate
coherent results, enabling DragDiffusion to complete high-quality editing
efficiently. Extensive experiments across a wide range of challenging cases
(e.g., multi-objects, diverse object categories, various styles, etc.)
demonstrate the versatility and generality of DragDiffusion.