视觉Transformer微调受益于非平滑组件

摘要

Transformer架构的平滑性已在泛化能力、训练稳定性和对抗鲁棒性等场景中得到广泛研究，但其在迁移学习中的作用仍鲜为人知。本文通过分析视觉Transformer组件的输出适应输入变化的能力（即其可塑性）来填补这一空白。该指标定义为平均变化率，能够捕捉模型对输入扰动的敏感度——高可塑性即对应低平滑度。我们通过理论分析和系统实验证明，这一视角能为迁移适配过程中的组件优先级选择提供理论指导。对实践者的关键启示在于：注意力模块和前馈层的高可塑性始终能带来更优的微调性能。这一发现突破了“平滑性必然有益”的主流认知，为Transformer的功能特性提供了新视角。代码已开源：https://github.com/ambroiseodt/vit-plasticity。

English

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on the functional properties of transformers. The code is available at https://github.com/ambroiseodt/vit-plasticity.

视觉Transformer微调受益于非平滑组件

Vision Transformer Finetuning Benefits from Non-Smooth Components

摘要

Support