SHERL:為資源有限的遷移學習綜合高準確性和高效記憶
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
July 10, 2024
作者: Haiwen Diao, Bo Wan, Xu Jia, Yunzhi Zhuge, Ying Zhang, Huchuan Lu, Long Chen
cs.AI
摘要
參數高效遷移學習(Parameter-efficient transfer learning,PETL)已成為一個蓬勃發展的研究領域,用於將大型預訓練模型適應到下游任務,大大減少可訓練參數,同時應對微調過程中的記憶挑戰。為了應對這一問題,記憶高效串列(Memory-efficient series,METL)避免通過大型主幹進行梯度反向傳播。然而,它們通過僅依賴凍結的中間輸出並限制對預訓練模型中先前知識的全面探索來進行妥協。此外,跨層特徵之間的依賴性和冗余經常被忽視,從而淹沒了更具區分性的表示,導致固有性能差距(與傳統的PETL方法相比)。因此,我們提出了一種名為SHERL的創新METL策略,用於資源有限的情況,將整個適應過程分解為兩個連續且互補的過程。在早期路線中,通過反冗余操作將中間輸出合併,增強它們對後續交互的兼容性;因此在晚期路線中,利用最少的晚期預訓練層可以減輕對記憶開銷的高峰需求,並將這些相當靈活的特徵調整為更適應和強大的表示,以應對新領域。對視覺和語言以及僅語言任務進行了大量消融實驗,結果顯示SHERL結合了參數和記憶高效技術的優勢,在微調過程中跨不同架構表現出與或更好的性能,並具有更低的記憶消耗。我們的程式碼可在以下鏈接公開獲取:https://github.com/Paranioar/SHERL。
English
Parameter-efficient transfer learning (PETL) has emerged as a flourishing
research field for adapting large pre-trained models to downstream tasks,
greatly reducing trainable parameters while grappling with memory challenges
during fine-tuning. To address it, memory-efficient series (METL) avoid
backpropagating gradients through the large backbone. However, they compromise
by exclusively relying on frozen intermediate outputs and limiting the
exhaustive exploration of prior knowledge from pre-trained models. Moreover,
the dependency and redundancy between cross-layer features are frequently
overlooked, thereby submerging more discriminative representations and causing
an inherent performance gap (vs. conventional PETL methods). Hence, we propose
an innovative METL strategy called SHERL for resource-limited scenarios to
decouple the entire adaptation into two successive and complementary processes.
In the early route, intermediate outputs are consolidated via an
anti-redundancy operation, enhancing their compatibility for subsequent
interactions; thereby in the late route, utilizing minimal late pre-trained
layers could alleviate the peak demand on memory overhead and regulate these
fairly flexible features into more adaptive and powerful representations for
new domains. Extensive ablations on vision-and-language and language-only tasks
show that SHERL combines the strengths of both parameter and memory-efficient
techniques, performing on-par or better across diverse architectures with lower
memory during fine-tuning. Our code is publicly available at:
https://github.com/Paranioar/SHERL.Summary
AI-Generated Summary