SenCache：基于敏感度感知缓存的扩散模型推理加速方案

摘要

扩散模型虽能实现最先进的视频生成质量，但由于需要大量连续去噪步骤，其推理过程仍成本高昂。这促使加速扩散推理的研究日益增多。在无需重新训练的加速方法中，缓存技术通过跨时间步复用已计算的模型输出来减少计算量。现有缓存方法依赖启发式准则选择缓存/复用时间步，且需大量调参。我们通过一种基于敏感度感知的缓存框架来解决这一局限。具体而言，我们通过分析模型输出对去噪输入（即含噪潜变量和时间步）扰动的敏感度，将缓存误差形式化，并证明该敏感度是预测缓存误差的关键指标。基于此分析，我们提出敏感度感知缓存（SenCache）——一种动态缓存策略，可基于单样本自适应选择缓存时间步。该框架为自适应缓存提供了理论基础，解释了先前经验性启发式方法为何能部分有效，并将其扩展为动态的样本特异性方法。在Wan 2.1、CogVideoX和LTX-Video上的实验表明，在相同计算预算下，SenCache比现有缓存方法具有更好的视觉质量。

English

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.

SenCache：基于敏感度感知缓存的扩散模型推理加速方案

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

摘要

Support