Cappy：用小型评分器优化和增强大型多任务语言模型

摘要

大型语言模型（LLMs）如T0、FLAN和OPT-IML，在统一的指令遵循范式下擅长多任务处理，同时展现出对未见任务的显著泛化能力。尽管它们表现出色，但这些LLMs的规模从数十亿到数千亿参数不等，需要大量计算资源，使得它们的训练和推断变得昂贵且低效。此外，将这些模型调整到下游应用中，特别是复杂任务，通常由于微调所需的广泛硬件要求而难以实现，即使使用提示微调等参数高效方法也是如此。此外，像OPT-IML-175B和FLAN-PaLM-540B这样最强大的多任务LLMs并不是公开可访问的，严重限制了它们的定制潜力。为了解决这些挑战，我们引入了一个预训练的小型评分器Cappy，旨在增强多任务LLMs的性能和效率。Cappy仅具有3.6亿参数，可以独立用于分类任务，也可以作为LLMs的辅助组件，提升它们的性能。此外，Cappy能够有效地整合下游监督，无需进行LLM微调或访问它们的参数。我们的实验表明，当独立处理来自PromptSource的11个语言理解任务时，Cappy的表现优于规模大几个数量级的LLMs。此外，在来自BIG-Bench的45个复杂任务中，Cappy极大地提升了先进多任务LLM FLAN-T5的性能。此外，Cappy灵活地与其他LLM适应方法合作，包括微调和上下文学习，提供额外的性能增强。

English

Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

Cappy：用小型评分器优化和增强大型多任务语言模型

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

摘要

Support