注意回復を伴う学習不要潜在フレーム間プルーニング

要旨

現在の動画生成モデルは高い計算遅延に悩まされており、リアルタイム応用の実現にはコストがかかりすぎる課題がある。本論文では、動画潜在パッチに内在する時間的冗長性を活用することでこの課題に取り組む。具体的には、重複した潜在パッチの再計算を検出・スキップするLIPAR（Latent Inter-frame Pruning with Attention Recovery）フレームワークを提案する。さらに、刈り込まれたトークンのアテンション値を近似する新しいAttention Recovery機構を導入し、単純な刈り込み手法の適用によって生じる視覚的ノイズを除去する。実験では、本手法により動画編集の処理効率が1.45倍向上し、NVIDIA A6000においてベースラインの8.4 FPSに対し平均12.2 FPSを達成した。提案手法は生成品質を損なわず、追加の学習なしでモデルにシームレスに統合可能である。本アプローチは、従来の圧縮アルゴリズムと現代的な生成パイプラインの間の隔たりを効果的に埋めるものである。

English

Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by 1.45times, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.

注意回復を伴う学習不要潜在フレーム間プルーニング

Training-free Latent Inter-Frame Pruning with Attention Recovery

要旨

Support