FouriScale:基于频率视角的无需训练的高分辨率图像合成
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
March 19, 2024
作者: Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li
cs.AI
摘要
在本研究中,我们深入探讨了从预训练扩散模型生成高分辨率图像的问题,解决了模型应用超出其训练分辨率时出现的重复模式和结构失真等持久性挑战。为了解决这一问题,我们从频域分析的角度引入了一种创新的、无需训练的方法 FouriScale。我们通过在预训练扩散模型中替换原始卷积层,结合扩张技术和低通操作,旨在分别实现跨分辨率的结构一致性和尺度一致性。通过进一步采用填充-裁剪策略的增强,我们的方法可以灵活处理各种长宽比的文本到图像生成。通过利用 FouriScale 作为指导,我们的方法成功平衡了生成图像的结构完整性和保真度,实现了任意大小、高分辨率和高质量生成的惊人能力。凭借其简单性和兼容性,我们的方法可以为未来探索超高分辨率图像合成提供宝贵的见解。代码将在 https://github.com/LeonHLJ/FouriScale 上发布。
English
In this study, we delve into the generation of high-resolution images from
pre-trained diffusion models, addressing persistent challenges, such as
repetitive patterns and structural distortions, that emerge when models are
applied beyond their trained resolutions. To address this issue, we introduce
an innovative, training-free approach FouriScale from the perspective of
frequency domain analysis. We replace the original convolutional layers in
pre-trained diffusion models by incorporating a dilation technique along with a
low-pass operation, intending to achieve structural consistency and scale
consistency across resolutions, respectively. Further enhanced by a
padding-then-crop strategy, our method can flexibly handle text-to-image
generation of various aspect ratios. By using the FouriScale as guidance, our
method successfully balances the structural integrity and fidelity of generated
images, achieving an astonishing capacity of arbitrary-size, high-resolution,
and high-quality generation. With its simplicity and compatibility, our method
can provide valuable insights for future explorations into the synthesis of
ultra-high-resolution images. The code will be released at
https://github.com/LeonHLJ/FouriScale.Summary
AI-Generated Summary