SenCache: 민감도 인식 캐싱을 통한 확산 모델 추론 가속화

초록

확산 모델은 최첨단 화질의 동영생성을 구현하지만, 많은 수의 순차적 노이즈 제거 단계로 인해 추론 비용이 여전히 높습니다. 이에 확산 모델 추론 가속화에 대한 연구가 활발히 진행되고 있습니다. 학습 없이 적용 가능한 가속화 방법 중 캐싱은 이전 시간 단계에서 계산된 모델 출력을 재사용하여 연산량을 줄입니다. 기존 캐싱 방법은 경험적 기준에 따라 캐싱/재사용 시점을 선택하며 많은 튜닝이 필요합니다. 본 연구는 이러한 한계를 이론에 기반한 민감도 인식 캐싱 프레임워크로 해결합니다. 구체적으로, 노이즈가 포함된 잠재 변수와 시간 단계라는 노이즈 제거 입력의 변화에 대한 모델 출력 민감도를 분석하여 캐싱 오차를 정형화하고, 이 민감도가 캐싱 오차의 주요 예측 인자임을 입증합니다. 이를 바탕으로 샘플별로 동적으로 캐싱 시점을 선택하는 Sensitivity-Aware Caching(SenCache) 방식을 제안합니다. 본 프레임워크는 적응형 캐싱의 이론적 근거를 제공하며, 기존 경험적 휴리스틱이 부분적으로 효과적이었던 이유를 설명하고 이를 동적·샘플 특화 접근법으로 확장합니다. Wan 2.1, CogVideoX, LTX-Video에서의 실험 결과, SenCache는 유사한 연산 예산 내에서 기존 캐싱 방법보다 우수한 시각적 품질을 달성함을 확인했습니다.

English

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.

SenCache: 민감도 인식 캐싱을 통한 확산 모델 추론 가속화

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

초록

Support