비전 트랜스포머 미세 조정은 비평활 구성 요소로부터 이점을 얻는다

초록

트랜스포머 아키텍처의 평활성(smoothness)은 일반화 성능, 학습 안정성, 적대적 강건성(adversarial robustness)과 관련하여 광범위하게 연구되어 왔습니다. 그러나 전이 학습에서의 역할은 아직 잘 이해되지 않고 있습니다. 본 논문에서는 비전 트랜스포머 구성 요소들이 입력의 변화에 대해 출력을 적응시키는 능력, 즉 가소성(plasticity)을 분석합니다. 이는 평균 변화율로 정의되며, 입력 섭동(input perturbation)에 대한 민감도를 포착합니다. 특히, 높은 가소성은 낮은 평활성을 의미합니다. 우리는 이 관점이 전이 적응 과정에서 우선적으로 수정해야 할 구성 요소를 선택하는 데 원칙적인 지침을 제공한다는 것을 이론적 분석과 포괄적인 실험을 통해 입증합니다. 실무자들에게 중요한 시사점은 어텐션 모듈과 피드포워드 계층의 높은 가소성이 일관되게 더 나은 미세 조정(finetuning) 성능으로 이어진다는 것입니다. 우리의 연구 결과는 평활성이 바람직하다는 기존의 가정과는 차별화되며, 트랜스포머의 기능적 특성에 대한 새로운 시각을 제시합니다. 코드는 https://github.com/ambroiseodt/vit-plasticity에서 확인할 수 있습니다.

English

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on the functional properties of transformers. The code is available at https://github.com/ambroiseodt/vit-plasticity.

비전 트랜스포머 미세 조정은 비평활 구성 요소로부터 이점을 얻는다

Vision Transformer Finetuning Benefits from Non-Smooth Components

초록

Support