DreamDistribution:用于文本到图像扩散模型的提示分发学习。
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
December 21, 2023
作者: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
cs.AI
摘要
文本到图像(T2I)扩散模型的普及使得能够从文本描述中生成高质量图像成为可能。然而,生成具有参考视觉属性的多样化定制图像仍然具有挑战性。本研究侧重于在更抽象的概念或类别级别上个性化T2I扩散模型,从一组参考图像中调整共同点,同时创建具有足够变化的新实例。我们提出了一种解决方案,允许预训练的T2I扩散模型学习一组软提示,从而通过从学习的分布中采样提示来生成新颖图像。这些提示提供了文本引导的编辑功能,并在控制变化和混合多个分布之间方面提供了额外的灵活性。我们还展示了学习的提示分布对于其他任务(如文本到3D)的适应性。最后,我们通过包括自动评估和人类评估在内的定量分析展示了我们方法的有效性。项目网站:https://briannlongzhao.github.io/DreamDistribution
English
The popularization of Text-to-Image (T2I) diffusion models enables the
generation of high-quality images from text descriptions. However, generating
diverse customized images with reference visual attributes remains challenging.
This work focuses on personalizing T2I diffusion models at a more abstract
concept or category level, adapting commonalities from a set of reference
images while creating new instances with sufficient variations. We introduce a
solution that allows a pretrained T2I diffusion model to learn a set of soft
prompts, enabling the generation of novel images by sampling prompts from the
learned distribution. These prompts offer text-guided editing capabilities and
additional flexibility in controlling variation and mixing between multiple
distributions. We also show the adaptability of the learned prompt distribution
to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our
approach through quantitative analysis including automatic evaluation and human
assessment. Project website: https://briannlongzhao.github.io/DreamDistribution