SpacTor-T5: 스팬 손상 및 대체 토큰 탐지를 통한 T5 모델 사전 학습

초록

대규모 언어 모델의 사전 학습은 매우 많은 자원을 소모하며, 종종 비효율적이고 학습 텍스트 시퀀스에 포함된 정보를 충분히 활용하지 못하는 것으로 알려져 있다. 본 논문에서는 (1) 스팬 손상(SC)과 토큰 교체 탐지(RTD)를 결합한 하이브리드 목적 함수와 (2) 초기 tau 반복 동안 하이브리드 목적 함수를 최적화한 후 표준 SC 손실로 전환하는 두 단계 커리큘럼으로 구성된 새로운 학습 절차인 SpacTor를 제안한다. 우리는 하이브리드 목적 함수의 효과가 두 단계 사전 학습 스케줄과 밀접하게 연관되어 있음을 실증적으로 보여주고, 그 이유에 대한 광범위한 분석을 제공한다. 다양한 NLP 작업에서 인코더-디코더 아키텍처(T5)를 사용한 실험에서 SpacTor-T5는 표준 SC 사전 학습과 동일한 다운스트림 성능을 보이면서도 사전 학습 반복 횟수를 50% 줄이고 총 FLOPs를 40% 감소시켰다. 또는 동일한 컴퓨팅 예산을 고려할 때, SpacTor는 다운스트림 벤치마크 성능을 크게 향상시키는 것으로 나타났다.

English

Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial tau iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.

SpacTor-T5: 스팬 손상 및 대체 토큰 탐지를 통한 T5 모델 사전 학습

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

초록

Support