使用擴散模型對單張圖像進行連續版面編輯

摘要

最近大規模文本到圖像擴散模型的進展已經使得許多圖像編輯應用成為可能。然而，目前尚無法編輯單張現有圖像的佈局。為了填補這一缺口，我們提出了第一個框架，用於編輯單張圖像的佈局，同時保留其視覺特性，從而實現對單張圖像的連續編輯。我們的方法通過兩個關鍵模塊來實現。首先，為了保留圖像中多個物體的特徵，我們將不同物體的概念進行解耦，並使用一種名為「遮罩文本反轉」的新方法將它們嵌入到獨立的文本標記中。接下來，我們提出了一種無需訓練的優化方法，用於對預先訓練的擴散模型執行佈局控制，從而使我們能夠重新生成具有學習概念的圖像並將它們與用戶指定的佈局對齊。作為首個編輯現有圖像佈局的框架，我們展示了我們的方法是有效的，並且優於其他為支持此任務而修改的基準線方法。我們的代碼將在接受後免費提供給公眾使用。

English

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

使用擴散模型對單張圖像進行連續版面編輯

Continuous Layout Editing of Single Images with Diffusion Models

摘要

Support