複合擴散 | 整體 >= Σ部分
Composite Diffusion | whole >= Σparts
July 25, 2023
作者: Vikram Jamwal, Ramaneswaran S
cs.AI
摘要
對於藝術家或平面設計師來說,場景的空間佈局是一個至關重要的設計選擇。然而,現有的文本到圖像擴散模型在整合空間信息方面提供的支持有限。本文介紹了複合擴散作為一種讓藝術家通過組合子場景生成高質量圖像的方法。藝術家可以通過靈活的自由形式段落佈局來指定這些子場景的排列。他們可以主要使用自然文本描述每個子場景的內容,並可以額外利用參考圖像或控制輸入,如線條藝術、塗鴉、人體姿勢、canny邊緣等。
我們提供了一種全面且模塊化的複合擴散方法,使得生成、組合和協調子場景的方式更加多樣化。此外,我們希望評估複合圖像在圖像質量和實現藝術家意圖方面的有效性。我們認為現有的圖像質量指標缺乏對圖像複合的全面評估。為解決這一問題,我們提出了特別適用於複合生成的新穎質量標準。
我們相信我們的方法提供了一種直觀的藝術創作方法。通過廣泛的用戶調查、定量和定性分析,我們展示了它如何實現對圖像生成具有更大的空間、語義和創意控制。此外,我們的方法無需重新訓練或修改基礎擴散模型的架構,可以與微調模型以即插即用的方式配合運作。
English
For an artist or a graphic designer, the spatial layout of a scene is a
critical design choice. However, existing text-to-image diffusion models
provide limited support for incorporating spatial information. This paper
introduces Composite Diffusion as a means for artists to generate high-quality
images by composing from the sub-scenes. The artists can specify the
arrangement of these sub-scenes through a flexible free-form segment layout.
They can describe the content of each sub-scene primarily using natural text
and additionally by utilizing reference images or control inputs such as line
art, scribbles, human pose, canny edges, and more.
We provide a comprehensive and modular method for Composite Diffusion that
enables alternative ways of generating, composing, and harmonizing sub-scenes.
Further, we wish to evaluate the composite image for effectiveness in both
image quality and achieving the artist's intent. We argue that existing image
quality metrics lack a holistic evaluation of image composites. To address
this, we propose novel quality criteria especially relevant to composite
generation.
We believe that our approach provides an intuitive method of art creation.
Through extensive user surveys, quantitative and qualitative analysis, we show
how it achieves greater spatial, semantic, and creative control over image
generation. In addition, our methods do not need to retrain or modify the
architecture of the base diffusion models and can work in a plug-and-play
manner with the fine-tuned models.