為多語言機器翻譯學習特定語言層

摘要

多語言機器翻譯承諾提升非英語語言之間的翻譯品質。這有幾個優勢，包括較低的延遲（無需進行兩次翻譯）和減少錯誤級聯（例如，避免在透過英語翻譯時丟失性別和禮貌信息）。然而，增加更多語言會降低每種語言的模型容量，通常透過增加整體模型大小來抵消，這會使訓練變得更困難且推理速度變慢。在這項工作中，我們引入了語言特定的Transformer層（LSLs），這使我們能夠增加模型容量，同時保持正向傳遞中使用的計算量和參數數量恆定。其關鍵想法是讓編碼器的某些層是源語言或目標語言特定的，同時保持其餘層共享。我們通過受啟發的神經架構搜索方法研究了放置這些層的最佳方式，並在單獨解碼器架構上使用LSLs相比不使用時實現了1.3 chrF（1.5 spBLEU）點的改進，並在共享解碼器上實現了1.9 chrF（2.2 spBLEU）的改進。

English

Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

為多語言機器翻譯學習特定語言層

Learning Language-Specific Layers for Multilingual Machine Translation

摘要

Support