크로스링구얼 슈퍼비전은 대규모 언어 모델 사전 학습을 향상시킨다

초록

최근 대규모 언어 모델(Large Language Models)의 사전 학습에서의 급속한 발전은 다음 토큰 예측(next token prediction)이나 범위 손상(span corruption)과 같은 자기 지도(self-supervised) 언어 모델링 목적 함수를 사용함에 따라 이루어졌습니다. 반면, 기계 번역 시스템(Machine Translation Systems)은 대부분 소스 언어와 대상 언어 간의 정렬된 데이터가 필요한 교차 언어 지도(cross-lingual supervision)를 통해 학습됩니다. 우리는 자기 지도 언어 모델링 목적 함수와 지도된 기계 번역 목적 함수를 혼합하여 사전 학습을 수행함으로써, 즉 사전 학습 과정에서 교차 언어 병렬 데이터를 포함시킴으로써, 컨텍스트 내 학습(in-context learning) 능력이 더 뛰어난 모델을 얻을 수 있음을 보여줍니다. 사전 학습은 매우 자원 집약적인 과정이며, 두 목적 함수 간의 최적 혼합 비율을 그리드 탐색(grid search)으로 찾는 것은 비용이 너무 많이 들기 때문에, 우리는 사전 학습 과정에서 이를 학습할 수 있는 간단하지만 효과적인 전략을 제안합니다.

English

The recent rapid progress in pre-training Large Language Models has relied on using self-supervised language modeling objectives like next token prediction or span corruption. On the other hand, Machine Translation Systems are mostly trained using cross-lingual supervision that requires aligned data between source and target languages. We demonstrate that pre-training Large Language Models on a mixture of a self-supervised Language Modeling objective and the supervised Machine Translation objective, therefore including cross-lingual parallel data during pre-training, yields models with better in-context learning abilities. As pre-training is a very resource-intensive process and a grid search on the best mixing ratio between the two objectives is prohibitively expensive, we propose a simple yet effective strategy to learn it during pre-training.

크로스링구얼 슈퍼비전은 대규모 언어 모델 사전 학습을 향상시킨다

Cross-Lingual Supervision improves Large Language Models Pre-training

초록

Support