殘差提示微調：通過殘差重新參數化改進提示微調

摘要

提示微調是預先訓練語言模型參數高效調整的成功方法之一。儘管被認為是最高效的方法之一（微調軟提示佔總參數的<0.1%），但通常表現比其他高效調整方法差，並且對超參數非常敏感。在這項工作中，我們介紹了殘差提示微調 - 一種簡單而高效的方法，顯著提高了提示微調的性能和穩定性。我們建議使用帶有殘差連接的淺層網絡來重新參數化軟提示嵌入。我們的實驗表明，殘差提示微調在SuperGLUE基準測試中明顯優於提示微調。值得注意的是，我們的方法在T5-Base上比提示微調提高了+7個點，並且可以將提示長度減少10倍而不影響性能。此外，我們展示了我們的方法對於學習率和提示初始化的選擇是穩健的，並且在少樣本設置中是有效的。

English

Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning. We propose to reparameterize soft prompt embeddings using a shallow network with a residual connection. Our experiments show that Residual Prompt Tuning significantly outperforms prompt tuning on SuperGLUE benchmark. Notably, our method reaches +7 points improvement over prompt tuning with T5-Base and allows to reduce the prompt length by 10x without hurting performance. In addition, we show that our approach is robust to the choice of learning rate and prompt initialization, and is effective in few-shot settings.

殘差提示微調：通過殘差重新參數化改進提示微調

Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

摘要

Support