可編輯的影像元素用於可控合成

摘要

擴散模型在文本引導的合成任務中取得了顯著進展。然而，編輯用戶提供的圖像仍然具有挑戰性，因為擴散模型的高維噪音輸入空間並不自然適合圖像反轉或空間編輯。在這項工作中，我們提出了一種圖像表示，促進了使用擴散模型進行輸入圖像的空間編輯。具體而言，我們學習將輸入編碼為能夠忠實重建輸入圖像的「圖像元素」。這些元素可以直觀地由用戶進行編輯，並由擴散模型解碼為逼真的圖像。我們展示了我們的表示在各種圖像編輯任務上的有效性，例如對象調整大小、重新排列、拖曳、去遮擋、去除、變化和圖像合成。專案頁面：https://jitengmu.github.io/Editable_Image_Elements/

English

Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we learn to encode an input into "image elements" that can faithfully reconstruct an input image. These elements can be intuitively edited by a user, and are decoded by a diffusion model into realistic images. We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition. Project page: https://jitengmu.github.io/Editable_Image_Elements/

可編輯的影像元素用於可控合成

Editable Image Elements for Controllable Synthesis

摘要

Support