ChatPaper.aiChatPaper

FouriScale:從頻率角度看待無需訓練的高解析度圖像合成

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

March 19, 2024
作者: Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li
cs.AI

摘要

在這項研究中,我們深入探討從預訓練擴散模型生成高解析度圖像的過程,解決了當模型應用於超出其訓練分辨率時出現的重複模式和結構失真等持久性挑戰。為了應對這個問題,我們從頻率域分析的角度引入了一種創新的、無需訓練的方法 FouriScale。我們通過在預訓練擴散模型中替換原始卷積層,並結合一種擴張技術和低通操作,旨在實現跨分辨率的結構一致性和尺度一致性。進一步通過填充後裁剪策略的增強,我們的方法可以靈活處理各種長寬比的文本到圖像生成。通過使用 FouriScale 作為指導,我們的方法成功平衡了生成圖像的結構完整性和保真度,實現了任意大小、高解析度和高質量生成的驚人能力。憑藉其簡單性和兼容性,我們的方法可以為未來對超高解析度圖像合成的探索提供寶貴的見解。代碼將在 https://github.com/LeonHLJ/FouriScale 上發布。
English
In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions. To address this issue, we introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation, intending to achieve structural consistency and scale consistency across resolutions, respectively. Further enhanced by a padding-then-crop strategy, our method can flexibly handle text-to-image generation of various aspect ratios. By using the FouriScale as guidance, our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation. With its simplicity and compatibility, our method can provide valuable insights for future explorations into the synthesis of ultra-high-resolution images. The code will be released at https://github.com/LeonHLJ/FouriScale.

Summary

AI-Generated Summary

PDF81December 15, 2024