擴散模型的隱藏語言
The Hidden Language of Diffusion Models
June 1, 2023
作者: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf
cs.AI
摘要
文字到圖像擴散模型展示了從文字概念(例如「醫生」、「愛」)生成高質量、多樣化圖像的無與倫比能力。然而,將文字映射到豐富的視覺表示的內部過程仍然是一個謎。在這項工作中,我們通過將輸入的文本提示分解為一小組可解釋元素來應對理解文字到圖像模型中的概念表示的挑戰。這是通過學習一個虛擬標記來實現的,該標記是模型詞彙表中標記的稀疏加權組合,其目標是重構為給定概念生成的圖像。應用於最先進的穩定擴散模型,這種分解揭示了概念表示中的非平凡和令人驚訝的結構。例如,我們發現一些概念,如「總統」或「作曲家」,被特定實例(例如「奧巴馬」、「拜登」)及其插值所主導。其他概念,如「幸福」,結合了可以是具體的(如「家庭」、「笑聲」)或抽象的(如「友誼」、「情感」)相關術語。除了窺探穩定擴散的內部運作,我們的方法還實現了單圖像分解為標記、偏見檢測和緩解,以及語義圖像操作等應用。我們的程式碼將在以下網址提供:https://hila-chefer.github.io/Conceptor/
English
Text-to-image diffusion models have demonstrated an unparalleled ability to
generate high-quality, diverse images from a textual concept (e.g., "a doctor",
"love"). However, the internal process of mapping text to a rich visual
representation remains an enigma. In this work, we tackle the challenge of
understanding concept representations in text-to-image models by decomposing an
input text prompt into a small set of interpretable elements. This is achieved
by learning a pseudo-token that is a sparse weighted combination of tokens from
the model's vocabulary, with the objective of reconstructing the images
generated for the given concept. Applied over the state-of-the-art Stable
Diffusion model, this decomposition reveals non-trivial and surprising
structures in the representations of concepts. For example, we find that some
concepts such as "a president" or "a composer" are dominated by specific
instances (e.g., "Obama", "Biden") and their interpolations. Other concepts,
such as "happiness" combine associated terms that can be concrete ("family",
"laughter") or abstract ("friendship", "emotion"). In addition to peering into
the inner workings of Stable Diffusion, our method also enables applications
such as single-image decomposition to tokens, bias detection and mitigation,
and semantic image manipulation. Our code will be available at:
https://hila-chefer.github.io/Conceptor/