嵌套擴散過程用於隨時圖像生成

摘要

擴散模型是當前圖像生成的最先進技術，通過將生成過程細分為許多精細的去噪步驟，合成高質量的圖像。儘管擴散模型表現良好，但在計算上很昂貴，需要許多神經功能評估（NFEs）。在這項工作中，我們提出了一種基於任意時間停止時仍能生成可行圖像的擴散方法。利用現有的預訓練擴散模型，我們展示生成方案可以重新組合為兩個嵌套的擴散過程，實現對生成圖像的快速迭代改進。我們使用這種嵌套擴散方法來窺探生成過程，並根據用戶的即時偏好實現靈活的排程。在對ImageNet和基於穩定擴散的文本到圖像生成的實驗中，我們展示了我們的方法在中間生成質量方面遠遠超過原始擴散模型，而最終較慢的生成結果保持可比較性，無論是在質量上還是在量化上。

English

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable.

嵌套擴散過程用於隨時圖像生成

Nested Diffusion Processes for Anytime Image Generation

摘要

Support