LightCache：面向视频生成的内存高效、免训练加速方案

摘要

无需训练的加速技术已成为基于扩散模型的视频生成领域中的前沿研究方向。扩散模型推理过程中潜在变量的冗余性为加速提供了天然的切入点。本文中，我们将推理过程分解为编码、去噪和解码三个阶段，并观察到基于缓存的加速方法往往会在后两个阶段导致显著的内存激增。针对这一问题，我们分析了不同阶段推理的特性，并提出了针对性的内存优化策略：1）异步缓存交换；2）特征分块；3）切片解码潜在变量。同时，我们确保这三种策略引入的时间开销低于其带来的加速收益。与基线方法相比，我们的方法在保持质量退化在可接受范围内的同时，实现了更快的推理速度和更低的内存占用。代码已开源，详见 https://github.com/NKUShaw/LightCache。

English

Training-free acceleration has emerged as an advanced research area in video generation based on diffusion models. The redundancy of latents in diffusion model inference provides a natural entry point for acceleration. In this paper, we decompose the inference process into the encoding, denoising, and decoding stages, and observe that cache-based acceleration methods often lead to substantial memory surges in the latter two stages. To address this problem, we analyze the characteristics of inference across different stages and propose stage-specific strategies for reducing memory consumption: 1) Asynchronous Cache Swapping. 2) Feature chunk. 3) Slicing latents to decode. At the same time, we ensure that the time overhead introduced by these three strategies remains lower than the acceleration gains themselves. Compared with the baseline, our approach achieves faster inference speed and lower memory usage, while maintaining quality degradation within an acceptable range. The Code is available at https://github.com/NKUShaw/LightCache .

LightCache：面向视频生成的内存高效、免训练加速方案

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

摘要

Support