文脈内学習のための事前学習

要旨

文脈内学習（In-context learning）は、事前学習された言語モデルが文脈内のタスク例と指示からタスクを学習する手法として、NLPコミュニティで大きな注目を集めています。しかし、言語モデルは文脈内で学習するよう明示的に訓練されていないため、文脈内学習の能力は十分に活用されていません。この問題に対処するため、我々はPICL（Pre-training for In-Context Learning）を提案します。PICLは、一般的なプレーンテキストコーパス上の「内在的タスク」の大規模なコレクションを用いて、単純な言語モデリング目的でモデルを事前学習することで、言語モデルの文脈内学習能力を強化するフレームワークです。PICLは、事前学習モデルのタスク汎化性を維持しつつ、文脈に基づいてタスクを推論し実行するようモデルを促します。我々は、PICLで訓練されたモデルの文脈内学習性能を、7つの広く使われているテキスト分類データセットと、100以上のNLPタスクをテキスト生成として定式化したSuper-NaturalInstructionsベンチマークで評価しました。実験の結果、PICLは一連のベースラインよりも効果的でタスク汎化性が高く、パラメータ数が約4倍大きい言語モデルを上回る性能を示しました。コードはhttps://github.com/thu-coai/PIClで公開されています。

English

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.

文脈内学習のための事前学習

Pre-Training to Learn in Context

要旨

Support