DistriFusion：用於高解析度擴散模型的分散式並行推論

摘要

擴散模型在合成高品質影像方面取得了巨大成功。然而，使用擴散模型生成高解析度影像仍然具有挑戰性，這是因為龐大的計算成本導致互動應用的延遲過高。本文提出了DistriFusion來應對這個問題，通過利用多個GPU之間的平行處理。我們的方法將模型輸入分割成多個區塊，並將每個區塊分配給一個GPU。然而，單純實現這樣的算法會破壞區塊之間的互動並且失去保真度，而引入這樣的互動將帶來巨大的通訊開銷。為了克服這個困境，我們觀察到相鄰擴散步驟之間的輸入具有很高的相似性，並提出了位移區塊平行處理，利用擴散過程的順序性，通過重複使用上一時間步的預先計算的特徵圖來為當前步驟提供上下文。因此，我們的方法支持非同步通訊，可以通過計算進行流水線處理。大量實驗表明，我們的方法可以應用於最新的Stable Diffusion XL，並實現與單個GPU相比高達6.1倍的加速。我們的代碼可以在https://github.com/mit-han-lab/distrifuser 公開獲取。

English

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a GPU. However, na\"{\i}vely implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step. Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1times speedup on eight NVIDIA A100s compared to one. Our code is publicly available at https://github.com/mit-han-lab/distrifuser.

DistriFusion：用於高解析度擴散模型的分散式並行推論

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

摘要

Support