使用扩散模型对单张图像进行连续布局编辑

摘要

最近大规模文本到图像扩散模型的进展使得图像编辑领域涌现出许多应用。然而，目前尚无一种方法能够编辑单个现有图像的布局。为了填补这一空白，我们提出了首个用于编辑单个图像布局的框架，同时保留其视觉特性，从而实现对单个图像的连续编辑。我们的方法通过两个关键模块实现。首先，为了保留图像中多个对象的特征，我们通过一种名为“掩码文本反演”的新方法，将不同对象的概念进行解耦，并嵌入到单独的文本标记中。接下来，我们提出了一种无需训练的优化方法，用于对预训练的扩散模型执行布局控制，从而使我们能够重新生成具有学习概念的图像，并将其与用户指定的布局对齐。作为首个能够编辑现有图像布局的框架，我们展示了我们的方法的有效性，并优于其他经过修改以支持此任务的基准方法。我们的代码将在接受后免费提供给公众使用。

English

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

使用扩散模型对单张图像进行连续布局编辑

Continuous Layout Editing of Single Images with Diffusion Models

摘要

Support