为多语言机器翻译学习特定语言层

摘要

多语言机器翻译承诺提高非英语语言之间的翻译质量。这对多方面都是有利的，主要包括较低的延迟（无需进行两次翻译）和减少错误级联（例如，在通过英语翻译时避免丢失性别和正式性信息）。然而，增加更多语言会降低每种语言的模型容量，通常通过增加整体模型大小来抵消，这会使训练变得更加困难，推理速度变慢。在这项工作中，我们引入了语言特定的Transformer层（LSLs），这使我们能够增加模型容量，同时保持正向传递中使用的计算量和参数数量恒定。关键思想是使编码器的一些层为源语言或目标语言特定，同时保持其余层共享。我们通过受启发的神经架构搜索方法研究了放置这些层的最佳方式，并在单独的解码器架构上实现了比不使用LSLs提高了1.3 chrF（1.5 spBLEU）点，以及在共享解码器上提高了1.9 chrF（2.2 spBLEU）点。

English

Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

为多语言机器翻译学习特定语言层

Learning Language-Specific Layers for Multilingual Machine Translation

摘要

Support