Quant VideoGen:透過2位元KV快取量化實現自回歸長影片生成
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
February 3, 2026
作者: Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer
cs.AI
摘要
儘管自迴歸影片擴散技術快速發展,一個新興的系統演算法瓶頸正限制著部署能力與生成效能:KV快取記憶體。在自迴歸影片生成模型中,KV快取會隨生成歷程增長並迅速佔據GPU記憶體,經常超過30GB,導致無法在普及硬體上部署。更關鍵的是,受限的KV快取預算會壓縮有效工作記憶體,直接削弱長時序生成在身份特徵、畫面佈局與運動軌跡的一致性。為解決此難題,我們提出Quant VideoGen(QVG)——一種專為自迴歸影片擴散模型設計的免訓練KV快取量化框架。QVG透過語義感知平滑技術利用影片時空冗餘性,產生低幅度、易量化的殘差;並進一步引入漸進殘差量化方案,採用由粗到精的多階段策略,在實現平滑的質量-記憶體權衡的同時降低量化誤差。在LongCat Video、HY WorldPlay與Self Forcing等基準測試中,QVG建立了質量與記憶體效率的新帕雷托前沿,將KV快取記憶體壓縮達7.0倍,端到端延遲開銷低於4%,且生成質量持續超越現有基線模型。
English
Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage scheme that reduces quantization error while enabling a smooth quality memory trade off. Across LongCat Video, HY WorldPlay, and Self Forcing benchmarks, QVG establishes a new Pareto frontier between quality and memory efficiency, reducing KV cache memory by up to 7.0 times with less than 4% end to end latency overhead while consistently outperforming existing baselines in generation quality.