监督知识使大型语言模型在上下文中学习更好。

Supervised Knowledge Makes Large Language Models Better In-context Learners

December 26, 2023

作者: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang

cs.AI

摘要

大型语言模型（LLMs）通过提示工程展现出新兴的上下文学习能力。大规模生成模型的最新进展进一步扩大了它们在真实世界语言应用中的使用。然而，在自然语言理解和问答中改善LLMs的泛化能力和事实性的关键挑战仍未得到充分探讨。虽然先前的上下文学习研究侧重于增强模型以符合用户的特定指令和质量期望，并避免不良输出，但几乎没有研究探讨使用任务特定的微调语言模型（SLMs）在推理阶段改善LLMs的上下文学习。我们的主要贡献在于建立了一个简单而有效的框架，增强了LLMs的可靠性，具体体现在：1）泛化超出分布数据，2）阐明LLMs如何受益于判别模型，3）减少生成任务中的幻觉。通过我们提出的插件方法，Llama 2和ChatGPT的增强版本在泛化能力和事实性方面超越了它们的原始版本。我们提供了一套全面的资源，包括16个策划数据集、提示、模型检查点和LLMs在9个不同任务中的输出。我们的实证分析揭示了将判别模型纳入LLMs的优势，并突显了我们方法在促进更可靠的LLMs方面的潜力。

English

Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs.