擴散自導引用於可控圖像生成

摘要

大規模生成模型能夠根據詳細的文字描述產生高質量的圖像。然而，圖像的許多方面是通過文字難以或不可能傳達的。我們引入了自我引導，這是一種通過引導擴散模型的內部表示來提供對生成圖像更大控制的方法。我們展示了從這些表示中可以提取形狀、位置和物體外觀等屬性，並用於引導採樣。自我引導的工作方式類似於分類器引導，但使用預訓練模型本身中存在的信號，無需額外的模型或訓練。我們展示了如何組合一組簡單的屬性來執行具有挑戰性的圖像操作，例如修改物體的位置或大小，將一個圖像中的物體外觀與另一個圖像的佈局合併，將來自多個圖像的物體組合成一個圖像等。我們還展示了自我引導可用於編輯真實圖像。有關結果和互動演示，請參見我們的項目頁面：https://dave.ml/selfguidance/

English

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

擴散自導引用於可控圖像生成

Diffusion Self-Guidance for Controllable Image Generation

摘要

Support