ChatPaper.aiChatPaper

利用超级神经元实现分类视觉问答的捷径方法

Taking Shortcuts for Categorical VQA Using Super Neurons

March 11, 2026
作者: Pierre Musacchio, Jaeyi Jeong, Dahun Kim, Jaesik Park
cs.AI

摘要

稀疏注意力向量(SAV)作为一种无需训练的卓越替代方案,已逐渐取代监督微调或低秩适配,用于提升视觉语言模型(VLM)的性能。其核心在于针对特定任务筛选少量精准的注意力头作为分类器,而非依赖模型原始预测。基于相似理念,我们发现直接以标量值形式探测VLM的原始激活信号,就足以在多样化的视觉下游任务中构建精准分类器。将关注点从注意力向量转向标量激活,显著扩展了精准参数的搜索空间,使我们能够从首个生成标记开始即刻识别更具判别力的神经元。我们将此类激活称为超级神经元(SN)。在此探测框架下,我们发现大型语言模型的浅层已存在足够多的超级神经元,使得模型可在首个生成标记处从第一层实现极端早期退出。与原始网络相比,超级神经元在实现最高5.10倍加速的同时,持续稳定地提升了分类性能。
English
Sparse Attention Vectors (SAVs) have emerged as an excellent training-free alternative to supervised finetuning or low-rank adaptation to improve the performance of Vision Language Models (VLMs). At their heart, SAVs select a few accurate attention heads for a task of interest and use them as classifiers, rather than relying on the model's prediction. In a similar spirit, we find that directly probing the raw activations of the VLM, in the form of scalar values, is sufficient to yield accurate classifiers on diverse visually grounded downstream tasks. Shifting focus from attention vectors to scalar activations dramatically increases the search space for accurate parameters, allowing us to find more discriminative neurons immediately from the first generated token. We call such activations Super Neurons (SNs). In this probing setting, we discover that enough SNs appear in the shallower layers of the large language model to allow for extreme early exiting from the first layer of the model at the first generated token. Compared to the original network, SNs robustly improve the classification performance while achieving a speedup of up to 5.10x.
PDF62March 30, 2026