DragDiffusion：利用擴散模型進行互動式基於點的圖像編輯

摘要

精確且可控的影像編輯是一項具有挑戰性的任務，吸引了相當大的關注。最近，DragGAN 實現了一個互動式基於點的影像編輯框架，並以像素級的精確度取得了令人印象深刻的編輯結果。然而，由於該方法基於生成對抗網路（GAN），其通用性受到預先訓練的 GAN 模型容量的上限限制。在這項工作中，我們將這種編輯框架擴展到擴散模型，並提出了 DragDiffusion。通過利用大規模預先訓練的擴散模型，我們大大提高了互動式基於點的編輯在現實場景中的應用性。儘管大多數現有的基於擴散的影像編輯方法是針對文本嵌入進行的，DragDiffusion 優化了擴散潛在以實現精確的空間控制。雖然擴散模型以迭代方式生成影像，但我們在實驗中顯示，優化擴散潛在在單一步驟上就足以生成連貫的結果，使得 DragDiffusion 能夠高效完成高質量的編輯。在廣泛的具有挑戰性案例（例如多對象、不同對象類別、各種風格等）上進行的大量實驗展示了 DragDiffusion 的多功能性和通用性。

English

Precise and controllable image editing is a challenging task that has attracted significant attention. Recently, DragGAN enables an interactive point-based image editing framework and achieves impressive editing results with pixel-level precision. However, since this method is based on generative adversarial networks (GAN), its generality is upper-bounded by the capacity of the pre-trained GAN models. In this work, we extend such an editing framework to diffusion models and propose DragDiffusion. By leveraging large-scale pretrained diffusion models, we greatly improve the applicability of interactive point-based editing in real world scenarios. While most existing diffusion-based image editing methods work on text embeddings, DragDiffusion optimizes the diffusion latent to achieve precise spatial control. Although diffusion models generate images in an iterative manner, we empirically show that optimizing diffusion latent at one single step suffices to generate coherent results, enabling DragDiffusion to complete high-quality editing efficiently. Extensive experiments across a wide range of challenging cases (e.g., multi-objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion.

DragDiffusion：利用擴散模型進行互動式基於點的圖像編輯

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

摘要

Support