基于虚拟头的高效自回归视频扩散模型
Efficient Autoregressive Video Diffusion with Dummy Head
January 28, 2026
作者: Hang Guo, Zhaoyang Jia, Jiahao Li, Bin Li, Yuanhao Cai, Jiangshan Wang, Yawei Li, Yan Lu
cs.AI
摘要
自回归视频扩散模型因其因果建模和迭代去噪特性,近期引发广泛研究关注。本研究发现,该类模型中的多头自注意力机制对历史帧存在利用不足的问题:约25%的注意力头几乎仅关注当前帧,且丢弃其KV缓存仅导致轻微性能下降。基于此,我们提出虚拟强制(Dummy Forcing)方法,通过异质内存分配减少注意力头间的上下文冗余,并结合动态头编程自适应分类注意力头类型。此外,我们开发了上下文打包技术以实现更激进的缓存压缩。无需额外训练,该方法在基线模型上实现最高2.0倍加速,支持24.3 FPS的视频生成且质量损失低于0.5%。项目页面详见https://csguoh.github.io/project/DummyForcing/。
English
The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.