UFOGen：通過擴散 GAN 實現一次大規模文本到圖像生成

摘要

文本到圖像擴散模型展示了將文本提示轉換為連貫圖像的卓越能力，然而其推論的計算成本仍然是一個持續的挑戰。為了應對這個問題，我們提出了UFOGen，一種新穎的生成模型，旨在實現超快速、一步到位的文本到圖像合成。與傳統方法專注於改進取樣器或應用蒸餾技術以提高擴散模型性能不同，UFOGen採用了一種混合方法，將擴散模型與GAN目標相結合。通過利用新引入的擴散-GAN目標和使用預先訓練的擴散模型進行初始化，UFOGen在單步條件下高效生成基於文本描述的高質量圖像方面表現出色。除了傳統的文本到圖像生成外，UFOGen在應用中展現了多樣性。值得注意的是，UFOGen是首批實現一步到位文本到圖像生成和多樣化下游任務的先驅模型之一，這在高效生成模型的領域中標誌著一個重大進步。

English

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, dagger indicates equal contribution.

UFOGen：通過擴散 GAN 實現一次大規模文本到圖像生成

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

摘要

Support