AdapterTune：面向冻结视觉Transformer的零初始化低秩适配器（注：此处"Zero-Initialized"译为"零初始化"是深度学习领域的标准译法，指参数初始值为零的初始化策略；"Low-Rank Adapters"译为"低秩适配器"是参数高效微调领域的通用术语；"Frozen Vision Transformers"译为"冻结视觉Transformer"指保持预训练权重不变的视觉Transformer模型）

摘要

基于视觉Transformer的固定骨干网络迁移面临两个未充分解决的问题：当适配器被简单插入固定特征提取器时出现的优化不稳定性，以及缺乏设定适配器容量的原则性指导。我们提出AdapterTune方法，通过在每个Transformer模块中添加残差低秩瓶颈结构，并将其上投影层初始化为零，确保适配网络从预训练函数精确启动，从而消除早期训练阶段的表征漂移。在理论分析层面，我们将适配器秩形式化为特征空间中逼近下游任务偏移的容量预算。由此得到的超额风险分解预测：随着秩的增加，精度提升呈现单调但递减的"肘部"现象，这一规律通过控制变量实验得到验证。我们在9个数据集和3种骨干网络规模上进行了全面评估，所有实验均采用多随机种子报告结果。在核心的5数据集迁移测试中，AdapterTune相比仅训练分类头的方法将Top-1准确率平均提升14.9个百分点，仅需全参数微调0.92%的参数量，并在15个数据集-骨干网络组合中的10个上超越全参数微调。在整个基准测试中，AdapterTune在所有数据集-骨干网络组合上均优于仅训练分类头的方法。通过对秩、放置位置和初始化方式的消融实验，我们分离了每个设计选择的影响。代码已开源：https://github.com/salimkhazem/adaptertune

English

Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune