能動的忘却を活用した事前学習による言語可塑性の向上

要旨

事前学習済み言語モデル（PLM）は、現在、自然言語処理の主要なモデルとなっている。その下流タスクにおける印象的な性能にもかかわらず、PLMを新しい言語に適用することは難しく、その能力を普遍的に利用可能にする上での障壁となっている。これまでの研究では、新しい言語に対して新たな埋め込み層を学習することでこの問題に対処できることが示されているが、この方法はデータと計算資源の両面で非効率的である。本論文では、事前学習中にアクティブな忘却メカニズムを使用することを提案し、新しい言語に迅速に適応可能なPLMを作成するシンプルな方法を提示する。具体的には、事前学習中にK回の更新ごとに埋め込み層をリセットすることで、PLMが限られた更新回数内で新しい埋め込みを学習する能力を向上させるよう促し、メタ学習に似た効果を生み出す。RoBERTaを用いた実験では、我々の忘却メカニズムを用いて事前学習されたモデルが、言語適応中に速い収束を示すだけでなく、特に英語から遠い言語において、低データ環境で標準的なモデルを上回る性能を示すことが確認された。

English

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

能動的忘却を活用した事前学習による言語可塑性の向上

Improving Language Plasticity via Pretraining with Active Forgetting

要旨

Support