PIXART-δ：具有潜在一致性模型的快速可控图像生成

摘要

本技术报告介绍了PIXART-{\delta}，这是一个将潜在一致性模型（LCM）和ControlNet集成到先进的PIXART-{\alpha}模型中的文本到图像合成框架。PIXART-{\alpha}以通过高效的训练过程生成1024px分辨率高质量图像而闻名。在PIXART-{\delta}中集成LCM显著加快了推理速度，使得仅需2-4步即可生成高质量图像。值得注意的是，PIXART-{\delta}实现了在0.5秒内生成1024x1024像素图像的突破，比PIXART-{\alpha}提高了7倍。此外，PIXART-{\delta}被设计为能够在单日内在32GB V100 GPU上高效训练。凭借其8位推理能力（von Platen等，2023），PIXART-{\delta}可以在8GB GPU内存限制下合成1024px图像，极大地提升了其可用性和易用性。此外，引入类似ControlNet的模块实现了对文本到图像扩散模型的精细控制。我们引入了一种新颖的ControlNet-Transformer架构，专门为Transformers定制，实现了明确的可控性以及高质量图像生成。作为一种最先进的开源图像生成模型，PIXART-{\delta}为稳定扩散模型家族提供了一个有前途的替代方案，对文本到图像合成做出了重大贡献。

English

This technical report introduces PIXART-{\delta}, a text-to-image synthesis framework that integrates the Latent Consistency Model (LCM) and ControlNet into the advanced PIXART-{\alpha} model. PIXART-{\alpha} is recognized for its ability to generate high-quality images of 1024px resolution through a remarkably efficient training process. The integration of LCM in PIXART-{\delta} significantly accelerates the inference speed, enabling the production of high-quality images in just 2-4 steps. Notably, PIXART-{\delta} achieves a breakthrough 0.5 seconds for generating 1024x1024 pixel images, marking a 7x improvement over the PIXART-{\alpha}. Additionally, PIXART-{\delta} is designed to be efficiently trainable on 32GB V100 GPUs within a single day. With its 8-bit inference capability (von Platen et al., 2023), PIXART-{\delta} can synthesize 1024px images within 8GB GPU memory constraints, greatly enhancing its usability and accessibility. Furthermore, incorporating a ControlNet-like module enables fine-grained control over text-to-image diffusion models. We introduce a novel ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability alongside high-quality image generation. As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

PIXART-δ：具有潜在一致性模型的快速可控图像生成

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

摘要

Support