UFOGen: 拡散GANを介した大規模テキストから画像へのワンパス生成

要旨

テキストから画像を生成する拡散モデルは、テキストプロンプトを一貫性のある画像に変換する驚異的な能力を実証してきましたが、その推論における計算コストは依然として大きな課題となっています。この問題に対処するため、我々はUFOGenを提案します。これは、超高速かつワンステップでテキストから画像を合成するために設計された新しい生成モデルです。従来のアプローチがサンプラーの改善や拡散モデルの蒸留技術に焦点を当てるのに対し、UFOGenは拡散モデルとGANの目的関数を統合したハイブリッド手法を採用しています。新たに導入された拡散-GAN目的関数と事前学習済み拡散モデルによる初期化を活用することで、UFOGenはテキスト記述に基づいた高品質な画像を効率的にワンステップで生成することに優れています。従来のテキストから画像生成に加えて、UFOGenは多様な応用においてその汎用性を発揮します。特に、UFOGenはワンステップでのテキストから画像生成と多様な下流タスクを可能にする先駆的なモデルの一つであり、効率的な生成モデルの領域において重要な進展を示しています。 \blfootnote{*Googleの学生研究者として行われた研究であり、ダガーは同等の貢献を示します。}

English

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, dagger indicates equal contribution.

UFOGen: 拡散GANを介した大規模テキストから画像へのワンパス生成

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

要旨

Support