光致加速：通过稀疏注意力实现自回归视频扩散的快速生成

摘要

先进的自回归视频生成模型虽已提升视觉保真度与交互性，但注意力机制的二次复杂度仍是高效部署的主要瓶颈。现有稀疏注意力方案在双向模型中表现良好，但我们发现其应用于自回归模型时会导致显著性能下降，原因有二：分块生成的孤立处理方式以及对历史信息上下文利用不足。基于此，我们提出首个面向自回归视频生成模型的稀疏注意力方案——Light Forcing。该方案通过分块感知增长机制定量评估每个分块的贡献度，从而确定其稀疏分配策略。这种渐进式稀疏增强策略使当前分块在生成过程中能够继承早期分块的先验知识。此外，我们引入分层稀疏注意力机制，以由粗到细的方式捕捉有价值的历史上下文与局部上下文。这种双层级掩码选择策略（即帧级与块级）可自适应处理多样化的注意力模式。大量实验表明，本方法在生成质量（如VBench得分84.5）与效率（端到端加速1.2~1.3倍）上均优于现有稀疏注意力方案。结合FP8量化和LightVAE，Light Forcing在RTX 5090 GPU上进一步实现2.3倍加速与19.7帧/秒的生成速度。代码将发布于https://github.com/chengtao-lv/LightForcing。

English

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (\ie, frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (\eg, 84.5 on VBench) and efficiency (\eg, 1.2{sim}1.3times end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a 2.3times speedup and 19.7\,FPS on an RTX~5090 GPU. Code will be released at https://github.com/chengtao-lv/LightForcing{https://github.com/chengtao-lv/LightForcing}.

光致加速：通过稀疏注意力实现自回归视频扩散的快速生成

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

摘要

Support