HiWave: ウェーブレットベースの拡散サンプリングによるトレーニング不要な高解像度画像生成

要旨

拡散モデルは、画像合成における主要なアプローチとして台頭し、卓越した写実性と多様性を実証している。しかし、高解像度での拡散モデルの学習は計算コストが高く、学習解像度を超える画像を合成する既存のゼロショット生成技術では、オブジェクトの重複や空間的不整合などのアーティファクトがしばしば生じる。本論文では、事前学習済みの拡散モデルを用いて超高解像度画像合成における視覚的忠実度と構造的整合性を大幅に向上させる、学習不要のゼロショットアプローチであるHiWaveを提案する。本手法は、事前学習済みモデルからベース画像を生成し、その後パッチ単位のDDIM逆変換ステップと新たなウェーブレットベースのディテールエンハンサーモジュールを適用する二段階のパイプラインを採用する。具体的には、まず逆変換手法を用いてベース画像からグローバルな整合性を保つ初期ノイズベクトルを導出する。その後、サンプリング中にウェーブレット領域のディテールエンハンサーがベース画像の低周波成分を保持して構造的一貫性を確保しつつ、高周波成分を選択的に誘導して微細なディテールとテクスチャを豊かにする。Stable Diffusion XLを用いた広範な評価により、HiWaveは従来の手法で見られる一般的な視覚的アーティファクトを効果的に軽減し、優れた知覚品質を達成することが示された。ユーザスタディでは、HiWaveが最先端の代替手法よりも80%以上の比較で好まれることが確認され、再学習やアーキテクチャの変更を必要とせずに高品質な超高解像度画像合成を実現するその有効性が強調された。

English

Diffusion models have emerged as the leading approach for image synthesis, demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts, including object duplication and spatial incoherence. In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline: generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures. Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality. A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.

HiWave: ウェーブレットベースの拡散サンプリングによるトレーニング不要な高解像度画像生成

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

要旨

Support