AdapterTune: 凍結されたVision Transformer向けゼロ初期化低ランクアダプター

要旨

Vision Transformerを用いた固定バックボーン転移学習には、これまで十分に検討されていない2つの課題がある。すなわち、アダプタを特徴抽出器に単純に挿入した際の最適化不安定性と、アダプタの容量設定に関する体系的な指針の欠如である。本研究ではAdapterTuneを提案する。これは各Transformerブロックに残差結合型の低ランクボトルネックを追加し、その上方向射影をゼロ初期化することで、適応済みネットワークが事前学習済み関数と完全に一致した状態から学習を開始し、学習初期の表現のずれを解消する。解析的観点では、アダプタのランクを特徴空間における下流タスク変化の近似能力として定式化する。導出された超過リスク分解により、ランク増加に伴う精度向上が単調でありながら逓減する「エルボー現象」が理論的に予測され、制御実験により実証された。 9データセット・3バックボーンスケールにおいて複数シードでの評価を実施。中核となる5データセット転移タスクでは、AdapterTuneはヘッドのみの転移よりも平均でトップ1精度を+14.9ポイント向上させ、全パラメータ微調整のわずか0.92%のパラメータのみを学習する。さらに15のデータセット・バックボーン組み合わせのうち10組で全パラメータ微調整を上回った。全ベンチマークを通じて、AdapterTuneはすべてのデータセット・バックボーン組み合わせでヘッドのみの転移を改善した。ランク・配置・初期化に関する ablation 実験により各設計選択の効果を分離して検証している。コードは以下で公開されている：https://github.com/salimkhazem/adaptertune

English

Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune

AdapterTune: 凍結されたVision Transformer向けゼロ初期化低ランクアダプター

AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

要旨

Support