DistriFusion：用于高分辨率扩散模型的分布式并行推理

摘要

扩散模型在合成高质量图像方面取得了巨大成功。然而，利用扩散模型生成高分辨率图像仍然具有挑战性，因为巨大的计算成本导致交互应用的延迟过高。本文提出了DistriFusion来解决这一问题，通过利用多个GPU之间的并行性。我们的方法将模型输入分割为多个块，并将每个块分配给一个GPU。然而，简单地实现这样的算法会破坏块之间的交互并丢失保真度，而引入这样的交互将带来巨大的通信开销。为了克服这一困境，我们观察到相邻扩散步骤的输入之间存在很高的相似性，并提出了位移块并行性，利用了扩散过程的顺序性质，通过重用先前时间步骤中预先计算的特征图为当前步骤提供上下文。因此，我们的方法支持异步通信，可以通过计算进行流水线处理。大量实验证明，我们的方法可以应用于最近的Stable Diffusion XL，无需降低质量，并在八个NVIDIA A100上相较于一个实现高达6.1倍的加速。我们的代码公开在https://github.com/mit-han-lab/distrifuser。

English

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a GPU. However, na\"{\i}vely implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step. Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1times speedup on eight NVIDIA A100s compared to one. Our code is publicly available at https://github.com/mit-han-lab/distrifuser.

DistriFusion：用于高分辨率扩散模型的分布式并行推理

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

摘要

Support