Vorab-Training zum Lernen im Kontext

Zusammenfassung

In-Context-Learning, bei dem vortrainierte Sprachmodelle lernen, Aufgaben anhand von Aufgabenbeispielen und Anweisungen in ihrem Kontext auszuführen, hat in der NLP-Community viel Aufmerksamkeit erregt. Allerdings wird die Fähigkeit des In-Context-Learnings nicht vollständig ausgeschöpft, da Sprachmodelle nicht explizit darauf trainiert werden, im Kontext zu lernen. Zu diesem Zweck schlagen wir PICL (Pre-training for In-Context Learning) vor, ein Framework, das die In-Context-Learning-Fähigkeit von Sprachmodellen verbessert, indem das Modell anhand einer großen Sammlung von „intrinsischen Aufgaben“ im allgemeinen Klartextkorpus mit dem einfachen Sprachmodellierungsziel vortrainiert wird. PICL ermutigt das Modell, Aufgaben durch die Bedingung auf den Kontext abzuleiten und auszuführen, während gleichzeitig die Aufgabenverallgemeinerung der vortrainierten Modelle beibehalten wird. Wir bewerten die In-Context-Learning-Leistung des mit PICL trainierten Modells anhand von sieben weit verbreiteten Textklassifizierungsdatensätzen und dem Super-NaturalInstructions-Benchmark, der über 100 NLP-Aufgaben enthält, die als Textgenerierung formuliert sind. Unsere Experimente zeigen, dass PICL effektiver und aufgabenverallgemeinerbarer ist als eine Reihe von Baselines und größere Sprachmodelle mit fast dem Vierfachen an Parametern übertrifft. Der Code ist öffentlich verfügbar unter https://github.com/thu-coai/PICL.

English

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.

Vorab-Training zum Lernen im Kontext

Pre-Training to Learn in Context

Zusammenfassung

Support