可編輯的影像元素用於可控合成
Editable Image Elements for Controllable Synthesis
April 24, 2024
作者: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park
cs.AI
摘要
擴散模型在文本引導的合成任務中取得了顯著進展。然而,編輯用戶提供的圖像仍然具有挑戰性,因為擴散模型的高維噪音輸入空間並不自然適合圖像反轉或空間編輯。在這項工作中,我們提出了一種圖像表示,促進了使用擴散模型進行輸入圖像的空間編輯。具體而言,我們學習將輸入編碼為能夠忠實重建輸入圖像的「圖像元素」。這些元素可以直觀地由用戶進行編輯,並由擴散模型解碼為逼真的圖像。我們展示了我們的表示在各種圖像編輯任務上的有效性,例如對象調整大小、重新排列、拖曳、去遮擋、去除、變化和圖像合成。專案頁面:https://jitengmu.github.io/Editable_Image_Elements/
English
Diffusion models have made significant advances in text-guided synthesis
tasks. However, editing user-provided images remains challenging, as the high
dimensional noise input space of diffusion models is not naturally suited for
image inversion or spatial editing. In this work, we propose an image
representation that promotes spatial editing of input images using a diffusion
model. Concretely, we learn to encode an input into "image elements" that can
faithfully reconstruct an input image. These elements can be intuitively edited
by a user, and are decoded by a diffusion model into realistic images. We show
the effectiveness of our representation on various image editing tasks, such as
object resizing, rearrangement, dragging, de-occlusion, removal, variation, and
image composition. Project page:
https://jitengmu.github.io/Editable_Image_Elements/