复合扩散 | 整体 >= Σ部分
Composite Diffusion | whole >= Σparts
July 25, 2023
作者: Vikram Jamwal, Ramaneswaran S
cs.AI
摘要
对于艺术家或平面设计师来说,场景的空间布局是一个关键的设计选择。然而,现有的文本到图像扩散模型在整合空间信息方面提供的支持有限。本文引入了复合扩散作为一种艺术家生成高质量图像的手段,通过从子场景中进行组合。艺术家可以通过灵活的自由形式分段布局指定这些子场景的排列。他们可以主要使用自然文本描述每个子场景的内容,并可以通过参考图像或控制输入(如线条艺术、涂鸦、人体姿势、canny边缘等)进行补充描述。
我们提供了一种全面且模块化的复合扩散方法,使得生成、组合和协调子场景的方式更加多样化。此外,我们希望评估复合图像在图像质量和实现艺术家意图方面的有效性。我们认为现有的图像质量度量缺乏对图像复合的整体评估。为解决这一问题,我们提出了特别适用于复合生成的新颖质量标准。
我们相信我们的方法提供了一种直观的艺术创作方法。通过广泛的用户调查、定量和定性分析,我们展示了它如何实现对图像生成的空间、语义和创意控制。此外,我们的方法无需重新训练或修改基础扩散模型的架构,可以与经过微调的模型插拔式地配合使用。
English
For an artist or a graphic designer, the spatial layout of a scene is a
critical design choice. However, existing text-to-image diffusion models
provide limited support for incorporating spatial information. This paper
introduces Composite Diffusion as a means for artists to generate high-quality
images by composing from the sub-scenes. The artists can specify the
arrangement of these sub-scenes through a flexible free-form segment layout.
They can describe the content of each sub-scene primarily using natural text
and additionally by utilizing reference images or control inputs such as line
art, scribbles, human pose, canny edges, and more.
We provide a comprehensive and modular method for Composite Diffusion that
enables alternative ways of generating, composing, and harmonizing sub-scenes.
Further, we wish to evaluate the composite image for effectiveness in both
image quality and achieving the artist's intent. We argue that existing image
quality metrics lack a holistic evaluation of image composites. To address
this, we propose novel quality criteria especially relevant to composite
generation.
We believe that our approach provides an intuitive method of art creation.
Through extensive user surveys, quantitative and qualitative analysis, we show
how it achieves greater spatial, semantic, and creative control over image
generation. In addition, our methods do not need to retrain or modify the
architecture of the base diffusion models and can work in a plug-and-play
manner with the fine-tuned models.