Cappy: 小型スコアラーによる大規模マルチタスク言語モデルの性能向上と強化

要旨

T0、FLAN、OPT-IMLなどの大規模言語モデル（LLM）は、統一された指示追従パラダイムの下でマルチタスク処理に優れており、未見のタスクに対する驚異的な汎化能力も示しています。しかし、これらのLLMは、数十億から数千億のパラメータ規模を有しており、その訓練と推論には膨大な計算資源が必要で、コストが高く非効率です。さらに、特に複雑なタスクに対する下流アプリケーションへの適応は、プロンプトチューニングなどのパラメータ効率的な手法を用いる場合でも、ファインチューニングに必要なハードウェア要件の高さからしばしば実現不可能です。また、OPT-IML-175BやFLAN-PaLM-540Bといった最も強力なマルチタスクLLMは公開されておらず、そのカスタマイズ可能性が大幅に制限されています。これらの課題に対処するため、我々はマルチタスクLLMの性能と効率を向上させるために設計された事前学習済みの小型スコアラー、Cappyを紹介します。わずか3億6千万のパラメータを持つCappyは、分類タスクにおいて独立して機能するか、LLMの補助コンポーネントとしてその性能を向上させることができます。さらに、CappyはLLMのファインチューニングやパラメータへのアクセスを必要とせずに、下流の監督情報を効率的に統合することを可能にします。我々の実験では、PromptSourceの11の言語理解タスクにおいて、Cappyが桁違いに大規模なLLMを上回る性能を示しました。また、BIG-Benchの45の複雑なタスクでは、Cappyが先進的なマルチタスクLLMであるFLAN-T5の性能を大幅に向上させました。さらに、Cappyはファインチューニングやインコンテキスト学習などの他のLLM適応手法と柔軟に連携し、追加の性能向上を提供します。

English

Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

Cappy: 小型スコアラーによる大規模マルチタスク言語モデルの性能向上と強化

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

要旨

Support