FouriScale: 高解像度画像合成における周波数視点に基づくトレーニング不要アプローチ

要旨

本研究では、事前学習済み拡散モデルを用いた高解像度画像生成に焦点を当て、モデルが学習解像度を超えて適用された際に生じる反復パターンや構造的歪みといった課題に取り組みます。この問題を解決するため、周波数領域解析の観点から、新規の学習不要アプローチであるFouriScaleを提案します。事前学習済み拡散モデルの元々の畳み込み層を、拡張技術とローパス操作を組み合わせて置き換えることで、それぞれ構造的一貫性とスケール一貫性を異なる解像度間で実現します。さらに、パディング後にクロップする戦略を加えることで、本手法は様々なアスペクト比のテキストから画像への生成を柔軟に処理できます。FouriScaleをガイダンスとして用いることで、本手法は生成画像の構造的整合性と忠実度のバランスを成功裏に保ち、任意サイズ・高解像度・高品質な生成能力を驚異的に実現します。その簡潔さと互換性から、本手法は超高解像度画像合成の今後の探求に貴重な知見を提供します。コードはhttps://github.com/LeonHLJ/FouriScaleで公開予定です。

English

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions. To address this issue, we introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation, intending to achieve structural consistency and scale consistency across resolutions, respectively. Further enhanced by a padding-then-crop strategy, our method can flexibly handle text-to-image generation of various aspect ratios. By using the FouriScale as guidance, our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation. With its simplicity and compatibility, our method can provide valuable insights for future explorations into the synthesis of ultra-high-resolution images. The code will be released at https://github.com/LeonHLJ/FouriScale.

FouriScale: 高解像度画像合成における周波数視点に基づくトレーニング不要アプローチ

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

要旨

Support