嵌套扩散过程用于随时图像生成

摘要

扩散模型是当前图像生成领域的最先进技术，通过将生成过程分解为许多精细的去噪步骤来合成高质量图像。尽管扩散模型表现良好，但在计算上代价高昂，需要进行许多神经功能评估（NFEs）。在这项工作中，我们提出了一种基于随时中止的扩散方法，可以在完成之前的任意时间停止生成可行图像。利用现有预训练的扩散模型，我们展示了生成方案可以重新组合为两个嵌套的扩散过程，实现对生成图像的快速迭代改进。我们使用这种嵌套扩散方法来窥视生成过程，并根据用户的即时偏好实现灵活调度。在对ImageNet和基于稳定扩散的文本到图像生成的实验中，我们展示了我们的方法在中间生成质量方面在定性和定量上远远超过原始扩散模型，同时最终缓慢生成的结果保持可比性。

English

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable.

嵌套扩散过程用于随时图像生成

Nested Diffusion Processes for Anytime Image Generation

摘要

Support