UFOGen:通过扩散生成对抗网络实现一次大规模文本到图像生成
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
November 14, 2023
作者: Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou
cs.AI
摘要
文本到图像扩散模型展示了在将文本提示转换为连贯图像方面的显著能力,然而其推断的计算成本仍然是一个持久的挑战。为了解决这个问题,我们提出了UFOGen,这是一种新颖的生成模型,专为超快速、一步到位的文本到图像合成而设计。与传统方法侧重于改进采样器或应用蒸馏技术于扩散模型不同,UFOGen采用了一种混合方法,将扩散模型与GAN目标相结合。利用新引入的扩散-GAN目标和使用预训练的扩散模型进行初始化,UFOGen在单步骤中擅长高效生成基于文本描述的高质量图像。除了传统的文本到图像生成,UFOGen在应用中展现了多样性。值得注意的是,UFOGen是首批能够实现一步到位的文本到图像生成和多样化下游任务的先驱模型之一,为高效生成模型领域的重大进展提供了突破。
English
Text-to-image diffusion models have demonstrated remarkable capabilities in
transforming textual prompts into coherent images, yet the computational cost
of their inference remains a persistent challenge. To address this issue, we
present UFOGen, a novel generative model designed for ultra-fast, one-step
text-to-image synthesis. In contrast to conventional approaches that focus on
improving samplers or employing distillation techniques for diffusion models,
UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN
objective. Leveraging a newly introduced diffusion-GAN objective and
initialization with pre-trained diffusion models, UFOGen excels in efficiently
generating high-quality images conditioned on textual descriptions in a single
step. Beyond traditional text-to-image generation, UFOGen showcases versatility
in applications. Notably, UFOGen stands among the pioneering models enabling
one-step text-to-image generation and diverse downstream tasks, presenting a
significant advancement in the landscape of efficient generative models.
\blfootnote{*Work done as a student researcher of Google, dagger indicates
equal contribution.