光致加速:基于稀疏注意力的自回归视频扩散模型加速方法
Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
February 4, 2026
作者: Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang
cs.AI
摘要
先进的自回归视频生成模型虽已提升视觉保真度与交互性,但注意力机制的二次复杂度仍是高效部署的主要瓶颈。现有稀疏注意力方案在双向模型中表现良好,但我们发现其应用于自回归模型时会导致显著性能下降,原因有二:分块生成的孤立处理方式以及对历史信息上下文利用不足。基于此,我们提出首个面向自回归视频生成模型的稀疏注意力方案Light Forcing。该方法通过分块感知增长机制量化评估每个数据块的贡献度,从而确定其稀疏分配策略。这种渐进式稀疏度提升策略使当前生成块能够继承早期块的先验知识。此外,我们引入分层稀疏注意力机制,以由粗到细的方式捕捉有价值的历史上下文和局部上下文。这种帧级与块级的双层掩码选择策略可自适应处理多样化注意力模式。大量实验表明,本方法在质量指标(如VBench得分84.5)和效率指标(如端到端加速1.2~1.3倍)上均优于现有稀疏注意力方案。结合FP8量化和LightVAE,Light Forcing在RTX 5090 GPU上进一步实现2.3倍加速与19.7帧/秒的生成速度。代码将发布于https://github.com/chengtao-lv/LightForcing。
English
Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (\ie, frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (\eg, 84.5 on VBench) and efficiency (\eg, 1.2{sim}1.3times end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a 2.3times speedup and 19.7\,FPS on an RTX~5090 GPU. Code will be released at https://github.com/chengtao-lv/LightForcing{https://github.com/chengtao-lv/LightForcing}.