多言語機械翻訳のための言語固有レイヤーの学習

要旨

多言語機械翻訳は、非英語言語間の翻訳品質向上を約束するものである。これにはいくつかの利点がある。具体的には、レイテンシの低減（二度翻訳する必要がない）や、エラーの連鎖の抑制（例えば、英語を介して翻訳する際の性別や丁寧さの情報の喪失を回避できる）などが挙げられる。一方で、言語を追加すると言語ごとのモデル容量が減少するという欠点がある。これは通常、モデル全体のサイズを増やすことで対処されるが、その結果、学習が難しくなり、推論も遅くなる。本研究では、フォワードパスで使用される計算量とパラメータ数を一定に保ちつつ、モデル容量を増加させることを可能にする言語固有Transformer層（LSLs）を導入する。鍵となるアイデアは、エンコーダの一部の層をソース言語またはターゲット言語固有にし、残りの層を共有するというものである。ニューラルアーキテクチャサーチに着想を得たアプローチを用いて、これらの層を配置する最適な方法を検討し、別々のデコーダアーキテクチャではLSLsを使用しない場合と比較して1.3 chrF（1.5 spBLEU）ポイント、共有デコーダアーキテクチャでは1.9 chrF（2.2 spBLEU）ポイントの改善を達成した。

English

Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

多言語機械翻訳のための言語固有レイヤーの学習

Learning Language-Specific Layers for Multilingual Machine Translation

要旨

Support