주의력 복원 기반의 학습 없는 잠재 프레임 간 프루닝

초록

현재 비디오 생성 모델은 높은 계산 지연 시간으로 인해 실시간 응용 프로그램 구현에 과도한 비용이 발생하는 문제점을 안고 있습니다. 본 논문에서는 비디오 잠재 패치에 내재된 시간적 중복성을 활용하여 이러한 한계를 해결하고자 합니다. 이를 위해 중복된 잠재 패치의 재계산을 탐지하고 생략하는 LIPAR(Latent Inter-frame Pruning with Attention Recovery) 프레임워크를 제안합니다. 더불어 프루닝된 토큰의 어텐션 값을 근사화하여 순수 프루닝 방식 적용에서 발생하는 시각적 결함을 제거하는 새로운 어텐션 복원 메커니즘을 도입합니다. 실험 결과, 본 방법은 비디오 편집 처리량을 1.45배 향상시켰으며, NVIDIA A6000에서 기준치 8.4 FPS 대비 평균 12.2 FPS를 달성했습니다. 제안 방법은 생성 품질을 저하시키지 않으며 추가 학습 없이 모델에 원활하게 통합될 수 있습니다. 본 접근법은 기존 압축 알고리즘과 현대 생성 파이프라인 간의 격차를 효과적으로 연결합니다.

English

Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by 1.45times, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.

주의력 복원 기반의 학습 없는 잠재 프레임 간 프루닝

Training-free Latent Inter-Frame Pruning with Attention Recovery

초록

Support