小型模型是大型語言模型的寶貴插件。

摘要

大型語言模型（LLMs）如 GPT-3 和 GPT-4 非常強大，但它們的權重通常不公開，且龐大的尺寸使這些模型難以使用常見硬體進行調整。因此，有效地使用大規模監督數據調整這些模型可能具有挑戰性。作為替代方案，基於上下文的學習（ICL）由於上下文長度限制，只能使用少量監督範例。在本文中，我們提出了超級基於上下文的學習（SuperICL），它允許黑盒LLMs與本地微調的較小模型合作，從而在監督任務上實現卓越性能。我們的實驗表明，SuperICL 可以提高性能，超越最先進的微調模型，同時解決基於上下文學習的不穩定問題。此外，SuperICL 可以增強較小模型的能力，如多語言性和可解釋性。

English

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.

小型模型是大型語言模型的寶貴插件。

Small Models are Valuable Plug-ins for Large Language Models

摘要

Support