以多角度課程學習者的方式預訓練語言模型
Pre-training Language Model as a Multi-perspective Course Learner
May 6, 2023
作者: Beiduo Chen, Shaohan Huang, Zihan Zhang, Wu Guo, Zhenhua Ling, Haizhen Huang, Furu Wei, Weiwei Deng, Qi Zhang
cs.AI
摘要
ELECTRA,生成器-鑑別器預訓練框架,在各種下游任務中取得了令人印象深刻的語義建構能力。儘管表現令人信服,ELECTRA仍然面臨單調的訓練和互動不足的挑戰。僅具有遮罩語言建模(MLM)的生成器導致偏向學習和鑑別器標籤不平衡,降低了學習效率;鑑別器對生成器沒有明確的反饋迴路導致這兩個組件之間存在差距,未充分利用課程學習。在本研究中,提出了一種多角度課程學習(MCL)方法,以獲取多個角度和視覺角度,實現高效的樣本預訓練,並充分利用生成器和鑑別器之間的關係。具體而言,設計了三個自監督課程,以多角度方式緩解MLM的固有缺陷並平衡標籤。此外,提出了兩個自校正課程,通過為次級監督創建“校正筆記本”來彌合兩個編碼器之間的差距。此外,進行了一個課程湯試驗,以解決MCL的“拉鋸戰”動態問題,進化出更強大的預訓練模型。實驗結果表明,我們的方法分別在GLUE和SQuAD 2.0基準上將ELECTRA的平均性能提高了2.8%和3.2%絕對點,並在相同設置下超越了最近的先進ELECTRA風格模型。預訓練的MCL模型可在https://huggingface.co/McmanusChen/MCL-base找到。
English
ELECTRA, the generator-discriminator pre-training framework, has achieved
impressive semantic construction capability among various downstream tasks.
Despite the convincing performance, ELECTRA still faces the challenges of
monotonous training and deficient interaction. Generator with only masked
language modeling (MLM) leads to biased learning and label imbalance for
discriminator, decreasing learning efficiency; no explicit feedback loop from
discriminator to generator results in the chasm between these two components,
underutilizing the course learning. In this study, a multi-perspective course
learning (MCL) method is proposed to fetch a many degrees and visual angles for
sample-efficient pre-training, and to fully leverage the relationship between
generator and discriminator. Concretely, three self-supervision courses are
designed to alleviate inherent flaws of MLM and balance the label in a
multi-perspective way. Besides, two self-correction courses are proposed to
bridge the chasm between the two encoders by creating a "correction notebook"
for secondary-supervision. Moreover, a course soups trial is conducted to solve
the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained
model. Experimental results show that our method significantly improves
ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on
GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style
models under the same settings. The pre-trained MCL model is available at
https://huggingface.co/McmanusChen/MCL-base.