ChatPaper.aiChatPaper

视觉自回归模型中的多样性始终存在

Diversity Has Always Been There in Your Visual Autoregressive Models

November 21, 2025
作者: Tong Wang, Guanyu Yang, Nian Liu, Kai Wang, Yaxing Wang, Abdelrahman M Shaker, Salman Khan, Fahad Shahbaz Khan, Senmao Li
cs.AI

摘要

视觉自回归(VAR)模型凭借其创新的跨尺度预测范式,近期受到广泛关注。相较于传统多步自回归(AR)模型和扩散模型,该模型在推理效率与图像质量方面均展现出显著优势。然而,尽管VAR模型效率出众,却常面临多样性坍缩问题——即输出多样性降低的现象,这与少步蒸馏扩散模型中观察到的情况类似。本文提出DiverseVAR方法,通过一种简单而有效的策略,在不需额外训练的前提下恢复VAR模型的生成多样性。我们的分析表明,特征图中的关键成分是早期尺度下多样性形成的主导因素。通过抑制模型输入中的关键成分并放大模型输出中的该成分,DiverseVAR在保持高保真合成的同时,有效释放了VAR模型的内在生成潜力。实验结果表明,该方法仅以可忽略的性能影响为代价,即可显著提升生成多样性。代码已开源于https://github.com/wangtong627/DiverseVAR。
English
Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.
PDF62December 1, 2025