LiteVAE：轻量级和高效的变分自动编码器，用于潜在扩散模型。

摘要

潜在扩散模型（LDMs）的进展彻底改变了高分辨率图像生成，但是这些系统核心的自编码器的设计空间仍未得到充分探索。本文介绍了LiteVAE，这是一种用于LDMs的自编码器系列，利用2D离散小波变换来提高可伸缩性和计算效率，而不会牺牲输出质量。我们还研究了LiteVAE的训练方法和解码器架构，并提出了几项增强措施，改善了训练动态和重建质量。我们的基础LiteVAE模型在保持输出质量的同时，将编码器参数减少了六倍，从而实现更快的训练速度和更低的GPU内存需求，而我们更大的模型在所有评估指标（rFID、LPIPS、PSNR和SSIM）上均优于具有相似复杂性的VAEs。

English

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM).

LiteVAE：轻量级和高效的变分自动编码器，用于潜在扩散模型。

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

摘要

Support