小型模型是大型語言模型的寶貴插件。
Small Models are Valuable Plug-ins for Large Language Models
May 15, 2023
作者: Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, Julian McAuley
cs.AI
摘要
大型語言模型(LLMs)如 GPT-3 和 GPT-4 非常強大,但它們的權重通常不公開,且龐大的尺寸使這些模型難以使用常見硬體進行調整。因此,有效地使用大規模監督數據調整這些模型可能具有挑戰性。作為替代方案,基於上下文的學習(ICL)由於上下文長度限制,只能使用少量監督範例。在本文中,我們提出了超級基於上下文的學習(SuperICL),它允許黑盒LLMs與本地微調的較小模型合作,從而在監督任務上實現卓越性能。我們的實驗表明,SuperICL 可以提高性能,超越最先進的微調模型,同時解決基於上下文學習的不穩定問題。此外,SuperICL 可以增強較小模型的能力,如多語言性和可解釋性。
English
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their
weights are often publicly unavailable and their immense sizes make the models
difficult to be tuned with common hardware. As a result, effectively tuning
these models with large-scale supervised data can be challenging. As an
alternative, In-Context Learning (ICL) can only use a small number of
supervised examples due to context length limits. In this paper, we propose
Super In-Context Learning (SuperICL) which allows black-box LLMs to work with
locally fine-tuned smaller models, resulting in superior performance on
supervised tasks. Our experiments demonstrate that SuperICL can improve
performance beyond state-of-the-art fine-tuned models while addressing the
instability problem of in-context learning. Furthermore, SuperICL can enhance
the capabilities of smaller models, such as multilinguality and
interpretability.