ChatPaper.aiChatPaper

SpacTor-T5:使用跨度損壞和替換標記檢測對 T5 模型進行預訓練

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

January 24, 2024
作者: Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia DeSalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar
cs.AI

摘要

預訓練大型語言模型被廣泛認為耗費極大資源且常常效率低下,未充分利用訓練文本序列中所包含的資訊。本文介紹了一種名為SpacTor的新訓練程序,包括(1)結合跨度損壞(SC)和標記替換檢測(RTD)的混合目標,以及(2)兩階段課程,優化初始tau次迭代中的混合目標,然後過渡到標準SC損失。我們實證表明,混合目標的有效性與兩階段預訓練時間表相關,並詳細分析了這種情況。在我們對編碼器-解碼器架構(T5)在各種自然語言處理任務上的實驗中,SpacTor-T5在保持相同下游性能的同時,實現了預訓練迭代次數減少50%和總FLOPs減少40%。或者,在相同計算預算的情況下,我們發現SpacTor能夠顯著提高下游基準性能。
English
Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial tau iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.
PDF132December 15, 2024