无需训练的注意力恢复型潜在帧间剪枝
Training-free Latent Inter-Frame Pruning with Attention Recovery
March 6, 2026
作者: Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu
cs.AI
摘要
当前视频生成模型存在计算延迟高的问题,导致实时应用成本极其昂贵。本文通过利用视频潜在块中固有的时间冗余性来解决这一局限。我们提出具有注意力恢复机制的潜在帧间剪枝框架(LIPAR),该框架能检测并跳过重复潜在块的重计算过程。此外,我们引入了一种新颖的注意力恢复机制,可近似还原被剪枝标记的注意力值,从而消除直接应用剪枝方法所产生的视觉伪影。实验表明,本方法将视频编辑吞吐量提升1.45倍,在NVIDIA A6000上平均达到12.2 FPS,而基线方法仅为8.4 FPS。所提方法在保持生成质量的同时,无需额外训练即可与模型无缝集成。我们的研究有效弥合了传统压缩算法与现代生成流程之间的鸿沟。
English
Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by 1.45times, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.