小模型是大语言模型的有价值的插件。
Small Models are Valuable Plug-ins for Large Language Models
May 15, 2023
作者: Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, Julian McAuley
cs.AI
摘要
大型语言模型(LLMs)如GPT-3和GPT-4非常强大,但它们的权重通常不公开,并且其巨大的体积使得这些模型难以使用常规硬件进行调整。因此,有效地使用大规模监督数据对这些模型进行调整可能具有挑战性。作为一种替代方案,基于上下文的学习(ICL)由于上下文长度限制,只能使用少量监督示例。在本文中,我们提出了超级基于上下文的学习(SuperICL),它允许黑盒LLMs与本地微调的较小模型配合工作,从而在监督任务上表现出更优异的性能。我们的实验表明,SuperICL可以提高性能,超越最先进的微调模型,同时解决基于上下文学习的不稳定问题。此外,SuperICL可以增强较小模型的能力,如多语言性和可解释性。
English
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their
weights are often publicly unavailable and their immense sizes make the models
difficult to be tuned with common hardware. As a result, effectively tuning
these models with large-scale supervised data can be challenging. As an
alternative, In-Context Learning (ICL) can only use a small number of
supervised examples due to context length limits. In this paper, we propose
Super In-Context Learning (SuperICL) which allows black-box LLMs to work with
locally fine-tuned smaller models, resulting in superior performance on
supervised tasks. Our experiments demonstrate that SuperICL can improve
performance beyond state-of-the-art fine-tuned models while addressing the
instability problem of in-context learning. Furthermore, SuperICL can enhance
the capabilities of smaller models, such as multilinguality and
interpretability.