TurboDiffusion:将视频扩散模型加速100-200倍
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
December 18, 2025
作者: Jintao Zhang, Kaiwen Zheng, Kai Jiang, Haoxu Wang, Ion Stoica, Joseph E. Gonzalez, Jianfei Chen, Jun Zhu
cs.AI
摘要
我们推出TurboDiffusion视频生成加速框架,该框架能在保持视频质量的同时,将端到端扩散生成速度提升100-200倍。TurboDiffusion主要依赖以下组件实现加速:(1)注意力加速:采用低比特SageAttention与可训练稀疏线性注意力(SLA)加速注意力计算;(2)步数蒸馏:通过rCM方法实现高效步数蒸馏;(3)W8A8量化:将模型参数和激活值量化为8比特,以加速线性层计算并压缩模型。此外,框架还融合了多项工程优化技术。
我们在Wan2.2-I2V-14B-720P、Wan2.1-T2V-1.3B-480P、Wan2.1-T2V-14B-720P和Wan2.1-T2V-14B-480P模型上进行了实验。结果表明,即使在单张RTX 5090 GPU上,TurboDiffusion也能实现100-200倍的视频生成加速,同时保持相当的视频质量。项目GitHub仓库已开源,包含模型检查点和易用代码,访问地址为:https://github.com/thu-ml/TurboDiffusion。
English
We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1) Attention acceleration: TurboDiffusion uses low-bit SageAttention and trainable Sparse-Linear Attention (SLA) to speed up attention computation. (2) Step distillation: TurboDiffusion adopts rCM for efficient step distillation. (3) W8A8 quantization: TurboDiffusion quantizes model parameters and activations to 8 bits to accelerate linear layers and compress the model. In addition, TurboDiffusion incorporates several other engineering optimizations.
We conduct experiments on the Wan2.2-I2V-14B-720P, Wan2.1-T2V-1.3B-480P, Wan2.1-T2V-14B-720P, and Wan2.1-T2V-14B-480P models. Experimental results show that TurboDiffusion achieves 100-200x speedup for video generation even on a single RTX 5090 GPU, while maintaining comparable video quality. The GitHub repository, which includes model checkpoints and easy-to-use code, is available at https://github.com/thu-ml/TurboDiffusion.