ChatPaper.aiChatPaper

StageVAR:面向视觉自回归模型的阶段感知加速技术

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

December 18, 2025
作者: Senmao Li, Kai Wang, Salman Khan, Fahad Shahbaz Khan, Jian Yang, Yaxing Wang
cs.AI

摘要

视觉自回归(VAR)建模通过下一尺度预测突破了传统自回归(AR)模型的下一令牌预测范式,实现了高质量图像生成。然而,VAR范式在大尺度步骤下存在计算复杂度和运行时间急剧增加的问题。现有加速方法虽能缩减大尺度步骤的运行时耗,但依赖人工步骤选择且忽视了生成过程中不同阶段的重要性差异。为此,我们提出StageVAR——针对VAR模型的系统性研究与阶段感知加速框架。分析表明:早期步骤对保持语义和结构一致性至关重要,应保持完整;而后期步骤主要进行细节优化,可通过剪枝或近似处理实现加速。基于此发现,StageVAR提出即插即用的加速策略,利用后期计算中的语义无关性与低秩特性,且无需额外训练。所提方法在GenEval基准上仅损失0.01分、DPG基准上下降0.26分的情况下,最高可实现3.4倍加速效果,持续优于现有加速基线。这些结果证明了阶段感知设计作为高效视觉自回归图像生成核心原则的有效性。
English
Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, StageVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing acceleration baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.
PDF51December 23, 2025