Q-ARVD：量化自回歸視頻擴散模型

摘要

自回归视频扩散模型（ARVDs）已成为流式视频生成领域一种极具前景的架构，为实时交互式视频生成和世界建模铺平了道路。尽管潜力巨大，但ARVDs高昂的推理成本仍是实际部署的主要障碍，因此模型量化成为提升效率的自然方向。然而，针对ARVDs的量化研究仍基本处于空白。我们的实证分析表明，直接将现有为标准扩散Transformer设计的量化方案应用于ARVDs会导致性能欠佳，揭示出与双向扩散模型截然不同的量化行为。本文中，我们识别出量化ARVDs面临的两大关键挑战：（C1）帧级量化敏感度严重不均衡。自回归生成过程中的误差累积会导致各帧的量化敏感度出现严重偏斜，呈现近似指数衰减的模式。（C2）权重中显著且异质的异常值模式。权重分布存在显著的异常值通道，这些通道的模式在不同层类型和模块深度间差异巨大。为解决这些问题，我们提出了Q-ARVD，一种用于精确量化ARVDs的新型框架。（S1）针对帧级敏感度严重不均衡，Q-ARVD将最终质量感知的帧加权机制融入量化目标中。（S2）为防止异质异常值降低性能，Q-ARVD引入了异常值感知的自适应双尺度量化，能够自动检测任意层的异常值通道是否存在及其数量，并将其隔离以保护正常通道。大量实验证明了Q-ARVD的优越性。

English

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.