龍擴散：在擴散模型上啟用拖曳式操作

摘要

儘管現有的大規模文本到圖像（T2I）模型能夠從詳細的文本描述中生成高質量圖像，但它們常常缺乏精確編輯生成或真實圖像的能力。本文提出了一種新穎的圖像編輯方法，DragonDiffusion，實現在Diffusion模型上進行拖曳風格的操作。具體來說，我們基於擴散模型中中間特徵的強相應性構建了分類器引導。通過特徵相應損失，它可以將編輯信號轉換為梯度，以修改擴散模型的中間表示。基於這種引導策略，我們還建立了多尺度引導，考慮了語義和幾何對齊。此外，我們添加了跨支自注意力，以保持原始圖像與編輯結果之間的一致性。通過高效設計，我們的方法實現了對生成或真實圖像的各種編輯模式，如物體移動、物體調整大小、物體外觀替換和內容拖曳。值得注意的是，所有編輯和內容保留信號均來自圖像本身，模型不需要微調或額外模塊。我們的源代碼將在 https://github.com/MC-E/DragonDiffusion 上提供。

English

Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to precisely edit the generated or real images. In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model. It can transform the editing signals into gradients via feature correspondence loss to modify the intermediate representation of the diffusion model. Based on this guidance strategy, we also build a multi-scale guidance to consider both semantic and geometric alignment. Moreover, a cross-branch self-attention is added to maintain the consistency between the original image and the editing result. Our method, through an efficient design, achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging. It is worth noting that all editing and content preservation signals come from the image itself, and the model does not require fine-tuning or additional modules. Our source code will be available at https://github.com/MC-E/DragonDiffusion.

龍擴散：在擴散模型上啟用拖曳式操作

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

摘要

Support