초월 뉴런을 활용한 범주형 VQA 지름길 접근법

초록

희소 어텐션 벡터(SAV)는 시각 언어 모델(VLM)의 성능을 향상시키기 위한 지도 미세 조정이나 저순위 적응을 대체하는 훌륭한 학습 불필요 방식으로 부상했습니다. SAV의 핵심은 특정 작업에 대해 정확한 몇 개의 어텐션 헤드를 선택하여 모델의 예측에 의존하기보다 분류기로 활용한다는 점입니다. 이와 유사한 맥락에서, 우리는 VLM의 원시 활성화를 스칼라 값 형태로 직접 탐색하는 것만으로도 다양한 시각적 하위 작업에서 정확한 분류기를 생성하는 데 충분하다는 사실을 발견했습니다. 어텐션 벡터에서 스칼라 활성화로 초점을 전환함으로써 정확한 매개변수 탐색 공간이 극적으로 확대되어, 생성된 첫 번째 토큰에서 바로 더 판별력 높은 뉴런을 발견할 수 있게 되었습니다. 우리는 이러한 활성화를 슈퍼 뉴런(SN)이라고 명명합니다. 이러한 탐색 환경에서 우리는 대규모 언어 모델의 얕은 층에 충분한 수의 SN이 존재하여, 생성된 첫 번째 토큰 시점에 모델의 첫 번째 계층에서 극도로 조기에 종료(early exiting)가 가능함을 확인했습니다. 기존 네트워크와 비교 시 SN은 최대 5.10배의 속도 향상을 달성하면서도 분류 성능을 견고하게 개선했습니다.

English

Sparse Attention Vectors (SAVs) have emerged as an excellent training-free alternative to supervised finetuning or low-rank adaptation to improve the performance of Vision Language Models (VLMs). At their heart, SAVs select a few accurate attention heads for a task of interest and use them as classifiers, rather than relying on the model's prediction. In a similar spirit, we find that directly probing the raw activations of the VLM, in the form of scalar values, is sufficient to yield accurate classifiers on diverse visually grounded downstream tasks. Shifting focus from attention vectors to scalar activations dramatically increases the search space for accurate parameters, allowing us to find more discriminative neurons immediately from the first generated token. We call such activations Super Neurons (SNs). In this probing setting, we discover that enough SNs appear in the shallower layers of the large language model to allow for extreme early exiting from the first layer of the model at the first generated token. Compared to the original network, SNs robustly improve the classification performance while achieving a speedup of up to 5.10x.

초월 뉴런을 활용한 범주형 VQA 지름길 접근법

Taking Shortcuts for Categorical VQA Using Super Neurons

초록

Support