无需训练的注意力恢复型潜在帧间剪枝

摘要

当前视频生成模型存在高计算延迟问题，导致实时应用成本过高。本文通过利用视频潜在块中固有的时间冗余性来解决这一局限，提出了基于注意力恢复的潜在帧间剪枝框架。该框架能检测并跳过重复潜在块的重计算过程，同时创新性地引入注意力恢复机制，通过近似被剪枝标记的注意力值来消除直接应用剪枝方法产生的视觉伪影。实验表明，本方法将视频编辑吞吐量提升1.45倍，在NVIDIA A6000上平均达到12.2 FPS，优于基线模型的8.4 FPS。所提方法在保持生成质量的同时无需额外训练即可无缝集成至现有模型，有效弥合了传统压缩算法与现代生成流水线之间的鸿沟。

English

Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by 1.45times, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.