AdapterTune：適用於凍結視覺Transformer的零初始化低秩適配器（注：標題採用技術術語直譯與自然語序結合的方式，既保留"Adapter"、"Transformer"等專業術詞的準確性，又通過"適用於"、"凍結"等動態表述增強可讀性。低秩適配器(low-rank adapters)是參數高效微調的標準譯法，零初始化(zero-initialized)採用技術社區常見譯法。）

摘要

基於視覺轉換器的凍結骨幹遷移面臨兩個未充分解決的問題：當適配器被簡單插入固定特徵提取器時出現的優化不穩定性，以及缺乏設定適配器容量的原則性指導。我們提出AdapterTune方法，通過在每個轉換器模塊中添加殘差低秩瓶頸層來增強模型，其中上投影層採用零初始化策略，確保適配網絡精確起始於預訓練函數並消除早期訓練階段的表徵漂移。在理論分析層面，我們將適配器秩形式化定義為特徵空間中逼近下游任務遷移的容量預算。由此得到的超風險分解預測：隨著秩的增加，準確率提升呈現單調但遞減的趨勢，這種「肘部現象」在控制變量掃描實驗中得到驗證。我們在9個數據集和3種骨幹網絡規模上進行評估，並全程採用多種子報告機制。在核心的5數據集遷移測試組中，AdapterTune相比僅訓練分類頭的方法平均提升Top-1準確率14.9個百分點，僅需全參數微調0.92%的訓練參數量，並在15組數據集-骨幹配對中的10組表現優於全參數微調。在完整基準測試中，AdapterTune在所有測試的數據集-骨幹配對上均超越僅訓練分類頭的方法。針對秩配置、放置策略和初始化方法的消融實驗分別驗證了各設計選擇的有效性。程式碼已開源於：https://github.com/salimkhazem/adaptertune

English

Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune

AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

摘要

Support