Cappy:利用小型評分器優於並提升大型多任務語言模型
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
November 12, 2023
作者: Bowen Tan, Yun Zhu, Lijuan Liu, Eric Xing, Zhiting Hu, Jindong Chen
cs.AI
摘要
大型語言模型(LLMs)如T0、FLAN和OPT-IML,在統一的指令遵循範式下擅長多任務處理,同時展現出對未知任務的卓越泛化能力。儘管它們表現出色,但這些LLMs的規模從數十億到數千億個參數不等,需要大量的計算資源,使得它們的訓練和推斷變得昂貴且低效。此外,將這些模型適應到下游應用,特別是複雜任務,通常由於微調所需的廣泛硬體要求而變得不可行,即使使用提示調整等參數節約方法。此外,像OPT-IML-175B和FLAN-PaLM-540B這樣最強大的多任務LLMs並不公開,嚴重限制了它們的定製潛力。為了應對這些挑戰,我們引入了一個預訓練的小型評分器Cappy,旨在增強多任務LLMs的性能和效率。Cappy僅具有3.6億個參數,在分類任務中可以獨立運作,也可以作為LLMs的輔助組件,提升它們的性能。此外,Cappy使得能夠有效整合下游監督,而無需進行LLM微調或訪問它們的參數。我們的實驗表明,當在來自PromptSource的11個語言理解任務上獨立工作時,Cappy的表現優於規模大幾個數量級的LLMs。此外,在BIG-Bench的45個複雜任務中,Cappy大幅提升了先進多任務LLM FLAN-T5的性能。此外,Cappy可以靈活配合其他LLM適應方法,包括微調和上下文學習,提供額外的性能增強。
English
Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in
multi-tasking under a unified instruction-following paradigm, where they also
exhibit remarkable generalization abilities to unseen tasks. Despite their
impressive performance, these LLMs, with sizes ranging from several billion to
hundreds of billions of parameters, demand substantial computational resources,
making their training and inference expensive and inefficient. Furthermore,
adapting these models to downstream applications, particularly complex tasks,
is often unfeasible due to the extensive hardware requirements for finetuning,
even when utilizing parameter-efficient approaches such as prompt tuning.
Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and
FLAN-PaLM-540B, are not publicly accessible, severely limiting their
customization potential. To address these challenges, we introduce a pretrained
small scorer, Cappy, designed to enhance the performance and efficiency of
multi-task LLMs. With merely 360 million parameters, Cappy functions either
independently on classification tasks or serve as an auxiliary component for
LLMs, boosting their performance. Moreover, Cappy enables efficiently
integrating downstream supervision without requiring LLM finetuning nor the
access to their parameters. Our experiments demonstrate that, when working
independently on 11 language understanding tasks from PromptSource, Cappy
outperforms LLMs that are several orders of magnitude larger. Besides, on 45
complex tasks from BIG-Bench, Cappy boosts the performance of the advanced
multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to
cooperate with other LLM adaptations, including finetuning and in-context
learning, offering additional performance enhancement.