LiteVAE:轻量级和高效的变分自动编码器,用于潜在扩散模型。
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
May 23, 2024
作者: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber
cs.AI
摘要
潜在扩散模型(LDMs)的进展彻底改变了高分辨率图像生成,但是这些系统核心的自编码器的设计空间仍未得到充分探索。本文介绍了LiteVAE,这是一种用于LDMs的自编码器系列,利用2D离散小波变换来提高可伸缩性和计算效率,而不会牺牲输出质量。我们还研究了LiteVAE的训练方法和解码器架构,并提出了几项增强措施,改善了训练动态和重建质量。我们的基础LiteVAE模型在保持输出质量的同时,将编码器参数减少了六倍,从而实现更快的训练速度和更低的GPU内存需求,而我们更大的模型在所有评估指标(rFID、LPIPS、PSNR和SSIM)上均优于具有相似复杂性的VAEs。
English
Advances in latent diffusion models (LDMs) have revolutionized
high-resolution image generation, but the design space of the autoencoder that
is central to these systems remains underexplored. In this paper, we introduce
LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete
wavelet transform to enhance scalability and computational efficiency over
standard variational autoencoders (VAEs) with no sacrifice in output quality.
We also investigate the training methodologies and the decoder architecture of
LiteVAE and propose several enhancements that improve the training dynamics and
reconstruction quality. Our base LiteVAE model matches the quality of the
established VAEs in current LDMs with a six-fold reduction in encoder
parameters, leading to faster training and lower GPU memory requirements, while
our larger model outperforms VAEs of comparable complexity across all evaluated
metrics (rFID, LPIPS, PSNR, and SSIM).