SHERL: リソース制約下での転移学習における高精度と効率的なメモリの統合

要旨

パラメータ効率的な転移学習（PETL）は、大規模な事前学習モデルを下流タスクに適応させるための研究分野として急速に発展し、ファインチューニング中のメモリ課題に対処しながら学習可能なパラメータを大幅に削減しています。この課題に対処するため、メモリ効率型シリーズ（METL）は、大規模なバックボーンを通じて勾配を逆伝播することを回避します。しかし、これらは凍結された中間出力にのみ依存し、事前学習モデルからの知識の徹底的な探索を制限することで妥協しています。さらに、層間特徴の依存性と冗長性が頻繁に見落とされ、より識別力のある表現が埋もれてしまい、従来のPETL手法との間に本質的な性能差が生じています。そこで、リソースが限られたシナリオ向けに、SHERLという革新的なMETL戦略を提案します。この戦略では、適応プロセスを2つの連続的で補完的なプロセスに分離します。初期ルートでは、冗長性を排除する操作を通じて中間出力を統合し、後続の相互作用のための互換性を高めます。これにより、後期ルートでは、最小限の後期事前学習層を活用することで、メモリオーバーヘッドのピーク需要を軽減し、これらの非常に柔軟な特徴を新しいドメインに適応的で強力な表現に調整します。視覚と言語、および言語のみのタスクにおける広範なアブレーション実験により、SHERLがパラメータ効率とメモリ効率の両技術の長所を組み合わせ、多様なアーキテクチャにおいて同等またはそれ以上の性能を発揮しつつ、ファインチューニング中のメモリ使用量を低く抑えることが示されました。私たちのコードは以下のURLで公開されています：https://github.com/Paranioar/SHERL。

English

Parameter-efficient transfer learning (PETL) has emerged as a flourishing research field for adapting large pre-trained models to downstream tasks, greatly reducing trainable parameters while grappling with memory challenges during fine-tuning. To address it, memory-efficient series (METL) avoid backpropagating gradients through the large backbone. However, they compromise by exclusively relying on frozen intermediate outputs and limiting the exhaustive exploration of prior knowledge from pre-trained models. Moreover, the dependency and redundancy between cross-layer features are frequently overlooked, thereby submerging more discriminative representations and causing an inherent performance gap (vs. conventional PETL methods). Hence, we propose an innovative METL strategy called SHERL for resource-limited scenarios to decouple the entire adaptation into two successive and complementary processes. In the early route, intermediate outputs are consolidated via an anti-redundancy operation, enhancing their compatibility for subsequent interactions; thereby in the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead and regulate these fairly flexible features into more adaptive and powerful representations for new domains. Extensive ablations on vision-and-language and language-only tasks show that SHERL combines the strengths of both parameter and memory-efficient techniques, performing on-par or better across diverse architectures with lower memory during fine-tuning. Our code is publicly available at: https://github.com/Paranioar/SHERL.

SHERL: リソース制約下での転移学習における高精度と効率的なメモリの統合

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

要旨

Support