运动感知缓存技术助力高效自回归视频生成

摘要

自回归视频生成范式在理论上为生成长视频提供了潜力，但其实际应用受限于序列迭代去噪的计算负担。虽然缓存复用策略可通过跳过冗余去噪步骤加速生成，但现有方法依赖粗粒度的块级跳略机制，无法捕捉细粒度的像素动态。这一疏漏至关重要：高运动像素需要更多去噪步骤以防止误差累积，而静态像素则可承受激进跳略。我们通过将缓存误差与残差不稳定性关联，从理论层面形式化这一洞见，并提出MotionCache——一种利用帧间差异作为像素级运动特征轻量级代理的运动感知缓存框架。该框架采用由粗到精的策略：初始预热阶段建立语义连贯性，随后通过运动加权的缓存复用动态调整各令牌的更新频率。在SkyReels-V2和MAGI-1等前沿模型上的大量实验表明，MotionCache分别实现了6.28倍和1.64倍的显著加速，同时有效保持生成质量（VBench指标分别仅下降1%和0.01%）。代码已开源：https://github.com/ywlq/MotionCache。

English

Autoregressive video generation paradigms offer theoretical promise for long video synthesis, yet their practical deployment is hindered by the computational burden of sequential iterative denoising. While cache reuse strategies can accelerate generation by skipping redundant denoising steps, existing methods rely on coarse-grained chunk-level skipping that fails to capture fine-grained pixel dynamics. This oversight is critical: pixels with high motion require more denoising steps to prevent error accumulation, while static pixels tolerate aggressive skipping. We formalize this insight theoretically by linking cache errors to residual instability, and propose MotionCache, a motion-aware cache framework that exploits inter-frame differences as a lightweight proxy for pixel-level motion characteristics. MotionCache employs a coarse-to-fine strategy: an initial warm-up phase establishes semantic coherence, followed by motion-weighted cache reuse that dynamically adjusts update frequencies per token. Extensive experiments on state-of-the-art models like SkyReels-V2 and MAGI-1 demonstrate that MotionCache achieves significant speedups of 6.28times and 1.64times respectively, while effectively preserving generation quality (VBench: 1%downarrow and 0.01%downarrow respectively). The code is available at https://github.com/ywlq/MotionCache.

运动感知缓存技术助力高效自回归视频生成

Motion-Aware Caching for Efficient Autoregressive Video Generation

摘要

Support