基于条件引导调度的混合数据-流水线并行加速扩散方法

摘要

扩散模型在高保真图像、视频及音频生成领域取得显著进展，但其推理过程仍存在计算成本高昂的问题。当前基于分布式并行的扩散加速方法虽能提升速度，却会产生明显生成伪影，且无法实现与GPU数量成正比的实质性加速。为此，我们提出一种混合并行框架，通过结合新颖的数据并行策略——基于条件的划分，与最优流水线调度方法——自适应并行切换，在条件扩散模型中实现低延迟生成并保持高质量输出。核心创新点在于：（i）利用条件化与非条件化去噪路径作为数据划分的新视角；（ii）根据两条路径间的去噪差异自适应启用最优流水线并行。在两张NVIDIA RTX 3090 GPU上，我们的框架在SDXL和SD3模型上分别实现了2.31倍和2.07倍的延迟降低，同时保持图像质量。这一结果验证了我们的方法对基于U-Net的扩散模型和基于DiT的流匹配架构具有普适性。在高分辨率合成场景下，本方法的加速效果亦优于现有技术。代码已开源：https://github.com/kaist-dmlab/Hybridiff。

English

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves 2.31times and 2.07times latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.

基于条件引导调度的混合数据-流水线并行加速扩散方法

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

摘要

Support