可控合成的可编辑图像元素
Editable Image Elements for Controllable Synthesis
April 24, 2024
作者: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park
cs.AI
摘要
扩散模型在文本引导的合成任务中取得了重大进展。然而,编辑用户提供的图像仍然具有挑战性,因为扩散模型的高维噪声输入空间并不自然适用于图像反演或空间编辑。在这项工作中,我们提出了一种图像表示,促进使用扩散模型进行输入图像的空间编辑。具体来说,我们学习将输入编码为可以忠实重建输入图像的“图像元素”。这些元素可以直观地由用户编辑,并通过扩散模型解码为逼真图像。我们展示了我们的表示在各种图像编辑任务上的有效性,例如对象调整大小、重新排列、拖动、去遮挡、移除、变化和图像合成。项目页面:https://jitengmu.github.io/Editable_Image_Elements/
English
Diffusion models have made significant advances in text-guided synthesis
tasks. However, editing user-provided images remains challenging, as the high
dimensional noise input space of diffusion models is not naturally suited for
image inversion or spatial editing. In this work, we propose an image
representation that promotes spatial editing of input images using a diffusion
model. Concretely, we learn to encode an input into "image elements" that can
faithfully reconstruct an input image. These elements can be intuitively edited
by a user, and are decoded by a diffusion model into realistic images. We show
the effectiveness of our representation on various image editing tasks, such as
object resizing, rearrangement, dragging, de-occlusion, removal, variation, and
image composition. Project page:
https://jitengmu.github.io/Editable_Image_Elements/Summary
AI-Generated Summary