Cappy: 소형 스코어러로 대형 멀티태스크 언어 모델의 성능 향상 및 능가

초록

T0, FLAN, OPT-IML과 같은 대규모 언어 모델(LLMs)은 통합된 지시-따르기 패러다임 하에서 멀티태스킹에 탁월한 성능을 보이며, 보이지 않는 작업에 대한 놀라운 일반화 능력도 나타냅니다. 이러한 LLMs는 수십억에서 수천억 개의 파라미터 규모를 가지고 있어 인상적인 성능을 발휘하지만, 상당한 계산 자원을 요구하기 때문에 훈련과 추론이 비용이 많이 들고 비효율적입니다. 더욱이, 특히 복잡한 작업에 대해 이러한 모델을 다운스트림 애플리케이션에 적용하는 것은 파라미터 효율적인 접근 방식인 프롬프트 튜닝을 사용하더라도 파인튜닝을 위한 광범위한 하드웨어 요구 사항으로 인해 종종 실현 불가능합니다. 또한, OPT-IML-175B와 FLAN-PaLM-540B와 같은 가장 강력한 멀티태스크 LLMs는 공개적으로 접근할 수 없어 그들의 맞춤화 가능성이 심각하게 제한됩니다. 이러한 문제를 해결하기 위해, 우리는 멀티태스크 LLMs의 성능과 효율성을 향상시키기 위해 설계된 사전 훈련된 소형 스코어러인 Cappy를 소개합니다. 단 3억 6천만 개의 파라미터만을 가진 Cappy는 분류 작업에서 독립적으로 작동하거나 LLMs의 보조 구성 요소로 작용하여 그들의 성능을 향상시킬 수 있습니다. 더욱이, Cappy는 LLM 파인튜닝이나 파라미터 접근 없이도 다운스트림 감독을 효율적으로 통합할 수 있게 합니다. 우리의 실험은 PromptSource의 11개 언어 이해 작업에서 독립적으로 작동할 때, Cappy가 수백 배 더 큰 LLMs를 능가하는 성능을 보임을 입증합니다. 또한, BIG-Bench의 45개 복잡한 작업에서 Cappy는 고급 멀티태스크 LLM인 FLAN-T5의 성능을 크게 향상시킵니다. 더 나아가, Cappy는 파인튜닝과 컨텍스트 내 학습을 포함한 다른 LLM 적응 방식과 유연하게 협력하여 추가적인 성능 향상을 제공합니다.

English

Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

Cappy: 소형 스코어러로 대형 멀티태스크 언어 모델의 성능 향상 및 능가

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

초록

Support