DreamDistribution:用於文本到圖像擴散模型的提示分佈學習。
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
December 21, 2023
作者: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
cs.AI
摘要
Text-to-Image(T2I)擴散模型的普及使得能夠從文本描述生成高質量的圖像成為可能。然而,生成具有參考視覺屬性的多樣化定制圖像仍然具有挑戰性。本研究專注於在更抽象的概念或類別級別上個性化T2I擴散模型,從一組參考圖像中適應共同特徵,同時創建具有足夠變化的新實例。我們提出了一種解決方案,允許預訓練的T2I擴散模型學習一組軟提示,從所學分佈中抽樣提示以生成新的圖像。這些提示提供了文本引導的編輯功能,並在控制變化和在多個分佈之間混合方面提供了額外的靈活性。我們還展示了所學提示分佈對於其他任務(如文本到3D)的適應性。最後,我們通過包括自動評估和人類評估在內的定量分析展示了我們方法的有效性。項目網站:https://briannlongzhao.github.io/DreamDistribution
English
The popularization of Text-to-Image (T2I) diffusion models enables the
generation of high-quality images from text descriptions. However, generating
diverse customized images with reference visual attributes remains challenging.
This work focuses on personalizing T2I diffusion models at a more abstract
concept or category level, adapting commonalities from a set of reference
images while creating new instances with sufficient variations. We introduce a
solution that allows a pretrained T2I diffusion model to learn a set of soft
prompts, enabling the generation of novel images by sampling prompts from the
learned distribution. These prompts offer text-guided editing capabilities and
additional flexibility in controlling variation and mixing between multiple
distributions. We also show the adaptability of the learned prompt distribution
to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our
approach through quantitative analysis including automatic evaluation and human
assessment. Project website: https://briannlongzhao.github.io/DreamDistribution