DragDiffusion：利用扩散模型进行交互式基于点的图像编辑

摘要

精确可控的图像编辑是一项具有挑战性的任务，吸引了相当大的关注。最近，DragGAN实现了一个交互式基于点的图像编辑框架，并以像素级精度实现了令人印象深刻的编辑结果。然而，由于这种方法基于生成对抗网络（GAN），其通用性受到预训练GAN模型容量的上限约束。在这项工作中，我们将这样的编辑框架扩展到扩散模型，并提出了DragDiffusion。通过利用大规模预训练的扩散模型，我们极大地提高了交互式基于点的编辑在现实场景中的适用性。虽然大多数现有基于扩散的图像编辑方法是基于文本嵌入的，DragDiffusion优化了扩散潜在以实现精确的空间控制。尽管扩散模型以迭代方式生成图像，但我们凭经验表明，在一个单一步骤中优化扩散潜在就足以生成连贯的结果，使DragDiffusion能够高效完成高质量的编辑。在广泛的具有挑战性的案例（如多对象、不同对象类别、各种风格等）上进行的大量实验展示了DragDiffusion的多功能性和通用性。

English

Precise and controllable image editing is a challenging task that has attracted significant attention. Recently, DragGAN enables an interactive point-based image editing framework and achieves impressive editing results with pixel-level precision. However, since this method is based on generative adversarial networks (GAN), its generality is upper-bounded by the capacity of the pre-trained GAN models. In this work, we extend such an editing framework to diffusion models and propose DragDiffusion. By leveraging large-scale pretrained diffusion models, we greatly improve the applicability of interactive point-based editing in real world scenarios. While most existing diffusion-based image editing methods work on text embeddings, DragDiffusion optimizes the diffusion latent to achieve precise spatial control. Although diffusion models generate images in an iterative manner, we empirically show that optimizing diffusion latent at one single step suffices to generate coherent results, enabling DragDiffusion to complete high-quality editing efficiently. Extensive experiments across a wide range of challenging cases (e.g., multi-objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion.

DragDiffusion：利用扩散模型进行交互式基于点的图像编辑

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

摘要

Support