生成式拼贴照片

摘要

文本到图像模型是图像生成的强大工具。然而，生成过程类似于掷骰子，很难实现捕捉用户需求的单一图像。本文提出了一个框架，通过从生成图像的各个部分合成图像，从而创建用户期望的图像，本质上形成了一个生成式拼贴照片。给定由ControlNet生成的一堆图像，使用相同的输入条件和不同的种子，我们让用户使用刷子界面从生成的结果中选择所需部分。我们引入了一种新颖的技术，接受用户的刷子笔画，利用基于图的优化在扩散特征空间中对生成的图像进行分割，然后通过一种新的特征空间混合方法合成分割区域。我们的方法在合成时忠实地保留了用户选择的区域，使它们和谐地组合在一起。我们展示了我们灵活的框架可用于许多应用，包括生成新的外观组合，修复不正确的形状和瑕疵，以及改进提示对齐。我们展示了每个应用的引人注目的结果，并证明我们的方法优于现有的图像混合方法和各种基准线。

English

Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.