可控图像生成的扩散自引导

摘要

大规模生成模型能够根据详细的文本描述生成高质量图像。然而，图像的许多方面通过文本很难或不可能传达。我们引入自我引导，这是一种通过引导扩散模型的内部表示来提供对生成图像更大控制的方法。我们展示了可以从这些表示中提取形状、位置和物体外观等属性，并用于引导采样。自我引导的工作原理类似于分类器引导，但使用预训练模型本身中存在的信号，无需额外的模型或训练。我们展示了如何组合一组简单属性来执行具有挑战性的图像操作，例如修改物体的位置或大小，将一个图像中物体的外观与另一个图像的布局合并，将多个图像中的物体组合成一个图像等。我们还展示了自我引导可用于编辑真实图像。有关结果和交互式演示，请访问我们的项目页面：https://dave.ml/selfguidance/

English

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

可控图像生成的扩散自引导

Diffusion Self-Guidance for Controllable Image Generation

摘要

Support