ChatPaper.aiChatPaper

AdapterTune:面向冻结视觉Transformer的零初始化低秩适配器 (注:此处"Zero-Initialized"译为"零初始化"是深度学习领域的标准译法,指参数初始值为零的初始化策略;"Low-Rank Adapters"译为"低秩适配器"是参数高效微调领域的通用术语;"Frozen Vision Transformers"译为"冻结视觉Transformer"指保持预训练权重不变的视觉Transformer模型)

AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

March 16, 2026
作者: Salim Khazem
cs.AI

摘要

基于视觉Transformer的固定骨干网络迁移面临两个未充分解决的问题:当适配器被简单插入固定特征提取器时出现的优化不稳定性,以及缺乏设定适配器容量的原则性指导。我们提出AdapterTune方法,通过在每个Transformer模块中添加残差低秩瓶颈结构,并将其上投影层初始化为零,确保适配网络从预训练函数精确启动,从而消除早期训练阶段的表征漂移。在理论分析层面,我们将适配器秩形式化为特征空间中逼近下游任务偏移的容量预算。由此得到的超额风险分解预测:随着秩的增加,精度提升呈现单调但递减的"肘部"现象,这一规律通过控制变量实验得到验证。我们在9个数据集和3种骨干网络规模上进行了全面评估,所有实验均采用多随机种子报告结果。在核心的5数据集迁移测试中,AdapterTune相比仅训练分类头的方法将Top-1准确率平均提升14.9个百分点,仅需全参数微调0.92%的参数量,并在15个数据集-骨干网络组合中的10个上超越全参数微调。在整个基准测试中,AdapterTune在所有数据集-骨干网络组合上均优于仅训练分类头的方法。通过对秩、放置位置和初始化方式的消融实验,我们分离了每个设计选择的影响。代码已开源:https://github.com/salimkhazem/adaptertune
English
Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune
PDF12March 20, 2026