Q-ARVD: 量化自回归视频扩散模型

摘要

自回归视频扩散模型（ARVDs）已成为流式视频生成的一种有前景的架构，为实时交互式视频生成和世界建模铺平了道路。尽管潜力巨大，但ARVDs高昂的推理成本仍是实际部署的主要障碍，因此模型量化成为提升效率的自然方向。然而，针对ARVDs的量化研究尚未充分展开。实证分析表明，直接沿用为标准扩散Transformer设计的现成量化方案应用于ARVDs会导致次优性能，这揭示了其与双向扩散模型截然不同的量化特性。本文中，我们识别出量化ARVDs面临的两大关键挑战：挑战一：帧间量化敏感性极度不均衡。自回归生成过程中的误差累积会导致各帧的量化敏感性呈现严重偏态分布，遵循指数式衰减模式。挑战二：权重中显著且异质的异常值模式。权重分布中存在显著的异常值通道，其模式在不同层类型和模块深度间差异巨大。为解决上述问题，我们提出Q-ARVD，一个用于精确量化ARVDs的新框架。措施一：针对帧间敏感性极度不均衡问题，Q-ARVD在量化目标中引入最终质量感知的帧加权机制。措施二：为防止异质异常值导致性能退化，Q-ARVD引入异常值感知的自适应双尺度量化，可自动检测任意层中异常值通道的存在与数量，并将其隔离以保护正常通道。大量实验证明了Q-ARVD的优越性。

English

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.