預訓練以學習上下文
Pre-Training to Learn in Context
May 16, 2023
作者: Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
cs.AI
摘要
在上下文學習中,預先訓練的語言模型從任務示例和上下文中的指示學習執行任務的方法,在自然語言處理社區中引起了很大的關注。然而,由於語言模型並未明確訓練以在上下文中學習,因此尚未充分發揮上下文學習的能力。為此,我們提出了PICL(用於上下文學習的預訓練),這是一個框架,通過在通用純文本語料庫上使用簡單的語言建模目標預先訓練模型來增強語言模型的上下文學習能力。PICL鼓勵模型在保持預先訓練模型任務泛化的情況下,通過對上下文進行條件化來推斷和執行任務。我們在七個廣泛使用的文本分類數據集和包含100多個自然語言處理任務的Super-NaturalInstrctions基準上評估了使用PICL訓練的模型的上下文學習性能,這些任務被制定為文本生成。我們的實驗表明,PICL比一系列基準線更有效且具有任務泛化能力,性能優於具有近4倍參數的更大語言模型。代碼可在https://github.com/thu-coai/PICL 上公開獲取。
English
In-context learning, where pre-trained language models learn to perform tasks
from task examples and instructions in their contexts, has attracted much
attention in the NLP community. However, the ability of in-context learning is
not fully exploited because language models are not explicitly trained to learn
in context. To this end, we propose PICL (Pre-training for In-Context
Learning), a framework to enhance the language models' in-context learning
ability by pre-training the model on a large collection of "intrinsic tasks" in
the general plain-text corpus using the simple language modeling objective.
PICL encourages the model to infer and perform tasks by conditioning on the
contexts while maintaining task generalization of pre-trained models. We
evaluate the in-context learning performance of the model trained with PICL on
seven widely-used text classification datasets and the Super-NaturalInstrctions
benchmark, which contains 100+ NLP tasks formulated to text generation. Our
experiments show that PICL is more effective and task-generalizable than a
range of baselines, outperforming larger language models with nearly 4x
parameters. The code is publicly available at https://github.com/thu-coai/PICL.