ChatPaper.aiChatPaper

渦輪擴散:將影片擴散模型加速100-200倍

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

December 18, 2025
作者: Jintao Zhang, Kaiwen Zheng, Kai Jiang, Haoxu Wang, Ion Stoica, Joseph E. Gonzalez, Jianfei Chen, Jun Zhu
cs.AI

摘要

我們推出TurboDiffusion影片生成加速框架,該框架能在保持影片品質的同時,將端到端擴散生成速度提升100-200倍。TurboDiffusion主要依賴以下組件實現加速:(1) 注意力加速:採用低比特SageAttention與可訓練稀疏線性注意力(SLA)加速注意力計算;(2) 步數蒸餾:通過rCM方法實現高效步數蒸餾;(3) W8A8量化:將模型參數與激活值量化至8比特,以加速線性層運算並壓縮模型。此外,框架還整合了多項工程優化技術。 我們在Wan2.2-I2V-14B-720P、Wan2.1-T2V-1.3B-480P、Wan2.1-T2V-14B-720P及Wan2.1-T2V-14B-480P模型上進行實驗。結果表明,即使在單張RTX 5090 GPU上,TurboDiffusion也能實現100-200倍的影片生成加速,且維持可比擬的影片品質。相關GitHub倉庫已開源,包含模型檢查點與易用代碼,訪問地址:https://github.com/thu-ml/TurboDiffusion。
English
We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1) Attention acceleration: TurboDiffusion uses low-bit SageAttention and trainable Sparse-Linear Attention (SLA) to speed up attention computation. (2) Step distillation: TurboDiffusion adopts rCM for efficient step distillation. (3) W8A8 quantization: TurboDiffusion quantizes model parameters and activations to 8 bits to accelerate linear layers and compress the model. In addition, TurboDiffusion incorporates several other engineering optimizations. We conduct experiments on the Wan2.2-I2V-14B-720P, Wan2.1-T2V-1.3B-480P, Wan2.1-T2V-14B-720P, and Wan2.1-T2V-14B-480P models. Experimental results show that TurboDiffusion achieves 100-200x speedup for video generation even on a single RTX 5090 GPU, while maintaining comparable video quality. The GitHub repository, which includes model checkpoints and easy-to-use code, is available at https://github.com/thu-ml/TurboDiffusion.
PDF471December 26, 2025