컨텍스트 내 학습을 위한 사전 훈련

초록

컨텍스트 내 학습(In-context learning)은 사전 학습된 언어 모델이 주어진 컨텍스트 내의 작업 예시와 지시를 통해 새로운 작업을 수행하는 방법으로, NLP 커뮤니티에서 많은 관심을 받고 있습니다. 그러나 언어 모델은 컨텍스트 내 학습을 명시적으로 학습하도록 훈련되지 않았기 때문에, 이러한 능력이 충분히 활용되지 않고 있습니다. 이를 해결하기 위해, 우리는 PICL(Pre-training for In-Context Learning)을 제안합니다. PICL은 일반 텍스트 코퍼스에 포함된 다양한 "내재적 작업(intrinsic tasks)"을 단순한 언어 모델링 목표로 사전 학습함으로써 언어 모델의 컨텍스트 내 학습 능력을 향상시키는 프레임워크입니다. PICL은 모델이 컨텍스트를 조건으로 하여 작업을 추론하고 수행하도록 유도하면서도, 사전 학습된 모델의 작업 일반화 능력을 유지합니다. 우리는 PICL로 훈련된 모델의 컨텍스트 내 학습 성능을 7개의 널리 사용되는 텍스트 분류 데이터셋과 100개 이상의 NLP 작업을 텍스트 생성으로 구성한 Super-NaturalInstructions 벤치마크에서 평가했습니다. 실험 결과, PICL은 다양한 베이스라인보다 더 효과적이고 작업 일반화 가능성이 높았으며, 매개변수가 약 4배 더 큰 언어 모델을 능가하는 성능을 보였습니다. 코드는 https://github.com/thu-coai/PICL에서 공개되어 있습니다.

English

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.

컨텍스트 내 학습을 위한 사전 훈련

Pre-Training to Learn in Context

초록

Support