UFOGen: 확산 GAN을 통한 대규모 텍스트-이미지 생성의 단일 전달 방식

초록

텍스트-이미지 확산 모델은 텍스트 프롬프트를 일관된 이미지로 변환하는 데 있어 뛰어난 능력을 보여주었지만, 그 추론 과정의 계산 비용은 여전히 지속적인 과제로 남아 있습니다. 이 문제를 해결하기 위해, 우리는 초고속의 단일 단계 텍스트-이미지 합성을 위해 설계된 새로운 생성 모델인 UFOGen을 제안합니다. 기존의 확산 모델을 개선하기 위해 샘플러를 개선하거나 증류 기법을 사용하는 전통적인 접근 방식과 달리, UFOGen은 확산 모델과 GAN 목적 함수를 통합한 하이브리드 방법론을 채택합니다. 새롭게 도입된 확산-GAN 목적 함수와 사전 훈련된 확산 모델로의 초기화를 활용함으로써, UFOGen은 단일 단계에서 텍스트 설명에 기반한 고품질 이미지를 효율적으로 생성하는 데 탁월한 성능을 보입니다. 전통적인 텍스트-이미지 생성 외에도, UFOGen은 다양한 응용 분야에서의 유연성을 보여줍니다. 특히, UFOGen은 단일 단계 텍스트-이미지 생성과 다양한 다운스트림 작업을 가능하게 하는 선구적인 모델 중 하나로, 효율적인 생성 모델의 지형에서 중요한 진전을 이루어냈습니다. \blfootnote{*구글의 학생 연구원으로 수행한 작업이며, 단검 표시는 동등한 기여를 나타냅니다.}

English

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, dagger indicates equal contribution.

UFOGen: 확산 GAN을 통한 대규모 텍스트-이미지 생성의 단일 전달 방식

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

초록

Support