任意時間画像生成のためのネスト型拡散プロセス

要旨

拡散モデルは現在、画像生成において最先端の技術であり、生成プロセスを多数の細かいノイズ除去ステップに分解することで高品質な画像を合成します。その優れた性能にもかかわらず、拡散モデルは計算コストが高く、多くのニューラル関数評価（NFE）を必要とします。本研究では、完了前に任意の時点で停止しても有効な画像を生成できる、いつでも停止可能な拡散ベースの手法を提案します。既存の事前学習済み拡散モデルを使用し、生成スキームを2つのネストされた拡散プロセスとして再構成することで、生成画像の高速な反復的改良を可能にします。このネスト拡散アプローチを用いて、生成プロセスを覗き見し、ユーザーの即時の好みに基づいた柔軟なスケジューリングを実現します。ImageNetおよびStable Diffusionベースのテキストから画像への生成実験において、本手法の中間生成品質が元の拡散モデルを大幅に上回り、最終的な低速生成結果も同等であることを定性的・定量的に示します。

English

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable.

任意時間画像生成のためのネスト型拡散プロセス

Nested Diffusion Processes for Anytime Image Generation

要旨

Support