Cappy：利用小型評分器優於並提升大型多任務語言模型

摘要

大型語言模型（LLMs）如T0、FLAN和OPT-IML，在統一的指令遵循範式下擅長多任務處理，同時展現出對未知任務的卓越泛化能力。儘管它們表現出色，但這些LLMs的規模從數十億到數千億個參數不等，需要大量的計算資源，使得它們的訓練和推斷變得昂貴且低效。此外，將這些模型適應到下游應用，特別是複雜任務，通常由於微調所需的廣泛硬體要求而變得不可行，即使使用提示調整等參數節約方法。此外，像OPT-IML-175B和FLAN-PaLM-540B這樣最強大的多任務LLMs並不公開，嚴重限制了它們的定製潛力。為了應對這些挑戰，我們引入了一個預訓練的小型評分器Cappy，旨在增強多任務LLMs的性能和效率。Cappy僅具有3.6億個參數，在分類任務中可以獨立運作，也可以作為LLMs的輔助組件，提升它們的性能。此外，Cappy使得能夠有效整合下游監督，而無需進行LLM微調或訪問它們的參數。我們的實驗表明，當在來自PromptSource的11個語言理解任務上獨立工作時，Cappy的表現優於規模大幾個數量級的LLMs。此外，在BIG-Bench的45個複雜任務中，Cappy大幅提升了先進多任務LLM FLAN-T5的性能。此外，Cappy可以靈活配合其他LLM適應方法，包括微調和上下文學習，提供額外的性能增強。

English

Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

Cappy：利用小型評分器優於並提升大型多任務語言模型

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

摘要

Support