量化视频生成:通过2位KV缓存量化实现自回归长视频生成
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
February 3, 2026
作者: Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer
cs.AI
摘要
尽管自回归视频扩散技术发展迅猛,一个新兴的系统算法瓶颈正同时制约着部署能力与生成性能:KV缓存内存。在自回归视频生成模型中,KV缓存随生成历史增长并迅速占据GPU内存,常超过30GB,导致无法在普及型硬件上部署。更关键的是,受限的KV缓存预算会压缩有效工作内存,直接削弱长序列生成在身份特征、画面布局和运动轨迹上的一致性。为解决这一挑战,我们提出量化视频生成(QVG)——一种面向自回归视频扩散模型的无训练KV缓存量化框架。QVG通过语义感知平滑技术利用视频时空冗余性,生成低幅值、量化友好的残差。该框架进一步引入渐进式残差量化方案,采用由粗到精的多阶段策略,在实现平滑的质量-内存权衡的同时降低量化误差。在LongCat Video、HY WorldPlay和Self Forcing等基准测试中,QVG建立了质量与内存效率的新帕累托边界,将KV缓存内存削减最高达7.0倍,端到端延迟开销低于4%,且在生成质量上持续超越现有基线。
English
Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage scheme that reduces quantization error while enabling a smooth quality memory trade off. Across LongCat Video, HY WorldPlay, and Self Forcing benchmarks, QVG establishes a new Pareto frontier between quality and memory efficiency, reducing KV cache memory by up to 7.0 times with less than 4% end to end latency overhead while consistently outperforming existing baselines in generation quality.