作为多角度课程学习者的预训练语言模型
Pre-training Language Model as a Multi-perspective Course Learner
May 6, 2023
作者: Beiduo Chen, Shaohan Huang, Zihan Zhang, Wu Guo, Zhenhua Ling, Haizhen Huang, Furu Wei, Weiwei Deng, Qi Zhang
cs.AI
摘要
ELECTRA,生成器-鉴别器预训练框架,在各种下游任务中取得了令人印象深刻的语义构建能力。尽管表现令人信服,ELECTRA仍然面临单调训练和互动不足的挑战。仅具有掩码语言建模(MLM)的生成器导致了偏见学习和鉴别器标签不平衡,降低了学习效率;鉴别器到生成器没有明确的反馈循环,导致这两个组件之间存在鸿沟,未充分利用课程学习。在本研究中,提出了一种多角度课程学习(MCL)方法,用于高效获取多个角度和视角,充分利用生成器和鉴别器之间的关系。具体来说,设计了三个自我监督课程,以多角度方式缓解MLM的固有缺陷并平衡标签。此外,提出了两个自我校正课程,通过为次级监督创建“校正笔记本”来弥合两个编码器之间的鸿沟。此外,进行了一次课程汤试验,以解决MCL的“拉锯战”动态问题,进化出更强大的预训练模型。实验结果表明,我们的方法在GLUE和SQuAD 2.0基准上分别将ELECTRA的平均性能提高了2.8%和3.2%绝对点,并在相同设置下超越了最近先进的ELECTRA风格模型。预训练的MCL模型可在https://huggingface.co/McmanusChen/MCL-base找到。
English
ELECTRA, the generator-discriminator pre-training framework, has achieved
impressive semantic construction capability among various downstream tasks.
Despite the convincing performance, ELECTRA still faces the challenges of
monotonous training and deficient interaction. Generator with only masked
language modeling (MLM) leads to biased learning and label imbalance for
discriminator, decreasing learning efficiency; no explicit feedback loop from
discriminator to generator results in the chasm between these two components,
underutilizing the course learning. In this study, a multi-perspective course
learning (MCL) method is proposed to fetch a many degrees and visual angles for
sample-efficient pre-training, and to fully leverage the relationship between
generator and discriminator. Concretely, three self-supervision courses are
designed to alleviate inherent flaws of MLM and balance the label in a
multi-perspective way. Besides, two self-correction courses are proposed to
bridge the chasm between the two encoders by creating a "correction notebook"
for secondary-supervision. Moreover, a course soups trial is conducted to solve
the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained
model. Experimental results show that our method significantly improves
ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on
GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style
models under the same settings. The pre-trained MCL model is available at
https://huggingface.co/McmanusChen/MCL-base.