通过多分辨率扩散模型缓解图像生成中的失真问题
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
June 13, 2024
作者: Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
cs.AI
摘要
本文介绍了通过整合新型多分辨率网络和时间相关层归一化对扩散模型进行创新增强。扩散模型因其在高保真图像生成中的有效性而备受关注。虽然传统方法依赖于卷积U-Net架构,但最近基于Transformer的设计表现出更优越的性能和可扩展性。然而,Transformer架构通过“patchification”对输入数据进行标记化,面临着在处理标记长度时自注意力操作的二次复杂性导致视觉保真度和计算复杂度之间的权衡。较大的patch尺寸可以提高注意力计算效率,但难以捕捉细粒度的视觉细节,导致图像失真。为解决这一挑战,我们提出了将多分辨率网络(DiMR)与扩散模型相结合,该框架跨多个分辨率细化特征,逐渐增强从低到高分辨率的细节。此外,我们引入了时间相关层归一化(TD-LN),这是一种参数高效的方法,将时间相关参数纳入层归一化中,注入时间信息以实现更优越的性能。我们的方法在类别条件的ImageNet生成基准上展示了其有效性,其中DiMR-XL变体胜过先前的扩散模型,在ImageNet 256 x 256上取得了1.70的新的FID分数,在ImageNet 512 x 512上取得了2.89的新的FID分数。项目页面:https://qihao067.github.io/projects/DiMR
English
This paper presents innovative enhancements to diffusion models by
integrating a novel multi-resolution network and time-dependent layer
normalization. Diffusion models have gained prominence for their effectiveness
in high-fidelity image generation. While conventional approaches rely on
convolutional U-Net architectures, recent Transformer-based designs have
demonstrated superior performance and scalability. However, Transformer
architectures, which tokenize input data (via "patchification"), face a
trade-off between visual fidelity and computational complexity due to the
quadratic nature of self-attention operations concerning token length. While
larger patch sizes enable attention computation efficiency, they struggle to
capture fine-grained visual details, leading to image distortions. To address
this challenge, we propose augmenting the Diffusion model with the
Multi-Resolution network (DiMR), a framework that refines features across
multiple resolutions, progressively enhancing detail from low to high
resolution. Additionally, we introduce Time-Dependent Layer Normalization
(TD-LN), a parameter-efficient approach that incorporates time-dependent
parameters into layer normalization to inject time information and achieve
superior performance. Our method's efficacy is demonstrated on the
class-conditional ImageNet generation benchmark, where DiMR-XL variants
outperform prior diffusion models, setting new state-of-the-art FID scores of
1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512. Project page:
https://qihao067.github.io/projects/DiMRSummary
AI-Generated Summary