视觉自回归模型中的多样性始终存在
Diversity Has Always Been There in Your Visual Autoregressive Models
November 21, 2025
作者: Tong Wang, Guanyu Yang, Nian Liu, Kai Wang, Yaxing Wang, Abdelrahman M Shaker, Salman Khan, Fahad Shahbaz Khan, Senmao Li
cs.AI
摘要
视觉自回归(VAR)模型近期因其创新的"下一尺度"预测范式而备受关注,相较于传统多步自回归(AR)模型和扩散模型,在推理效率与图像质量方面均展现出显著优势。然而,尽管VAR模型效率出众,却常面临多样性坍缩问题——即输出多样性降低的现象,这与少步数蒸馏扩散模型中观察到的情况类似。本文提出DiverseVAR,一种无需额外训练即可恢复VAR模型生成多样性的简易有效方案。我们的分析发现,特征图中的关键成分是早期尺度多样性形成的主导因素。通过抑制模型输入中的关键成分并增强模型输出的该成分,DiverseVAR在保持高保真合成的同时,有效释放了VAR模型的内在生成潜力。实证结果表明,我们的方法能以可忽略的性能影响显著提升生成多样性。代码已公开于https://github.com/wangtong627/DiverseVAR。
English
Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.