小模型是大语言模型的有价值的插件。

摘要

大型语言模型（LLMs）如GPT-3和GPT-4非常强大，但它们的权重通常不公开，并且其巨大的体积使得这些模型难以使用常规硬件进行调整。因此，有效地使用大规模监督数据对这些模型进行调整可能具有挑战性。作为一种替代方案，基于上下文的学习（ICL）由于上下文长度限制，只能使用少量监督示例。在本文中，我们提出了超级基于上下文的学习（SuperICL），它允许黑盒LLMs与本地微调的较小模型配合工作，从而在监督任务上表现出更优异的性能。我们的实验表明，SuperICL可以提高性能，超越最先进的微调模型，同时解决基于上下文学习的不稳定问题。此外，SuperICL可以增强较小模型的能力，如多语言性和可解释性。

English

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.

小模型是大语言模型的有价值的插件。

Small Models are Valuable Plug-ins for Large Language Models

摘要

Support