ChatPaper.aiChatPaper

在上下文中进行预训练

Pre-Training to Learn in Context

May 16, 2023
作者: Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
cs.AI

摘要

在自然语言处理领域,上下文学习引起了广泛关注,其中预训练语言模型通过上下文中的任务示例和指令学习执行任务。然而,由于语言模型没有明确训练以在上下文中学习,上下文学习的能力并未得到充分利用。为此,我们提出了PICL(面向上下文学习的预训练)框架,通过在通用纯文本语料库中对模型进行“内在任务”的大规模预训练,使用简单的语言建模目标来增强语言模型的上下文学习能力。PICL鼓励模型在上下文的条件下推断和执行任务,同时保持预训练模型的任务泛化能力。我们在七个广泛使用的文本分类数据集和包含100多个自然语言处理任务的Super-NaturalInstrctions基准测试上评估了使用PICL训练的模型的上下文学习性能。我们的实验表明,PICL比一系列基准模型更有效且具有任务泛化能力,性能优于参数几乎为其4倍的更大语言模型。代码公开可在https://github.com/thu-coai/PICL获取。
English
In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.
PDF20December 15, 2024