다국어 기계 번역을 위한 언어별 계층 학습

초록

다국어 기계 번역은 비영어 언어 간의 번역 품질을 향상시킬 것으로 기대됩니다. 이는 여러 가지 이유로 유리한데, 특히 지연 시간 감소(두 번 번역할 필요 없음)와 오류 전파 감소(예: 영어를 거쳐 번역할 때 성별 및 공손함 정보 손실 방지)가 그 예입니다. 반면, 더 많은 언어를 추가하면 언어당 모델 용량이 감소하는데, 이는 일반적으로 전체 모델 크기를 늘려 해결하며, 이는 학습을 더 어렵게 하고 추론 속도를 느리게 만듭니다. 본 연구에서는 Language-Specific Transformer Layers(LSLs)를 소개하여 모델 용량을 늘리면서도 순전파 시 사용되는 계산량과 매개변수 수를 일정하게 유지합니다. 핵심 아이디어는 인코더의 일부 레이어를 소스 또는 타겟 언어에 특화시키고, 나머지 레이어는 공유하는 것입니다. 신경망 구조 탐색에서 영감을 받은 접근법을 사용하여 이러한 레이어를 배치하는 최적의 방법을 연구하고, 별도의 디코더 아키텍처에서는 LSLs를 사용하지 않았을 때보다 1.3 chrF(1.5 spBLEU) 점수, 공유 디코더 아키텍처에서는 1.9 chrF(2.2 spBLEU) 점수의 개선을 달성했습니다.

English

Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

다국어 기계 번역을 위한 언어별 계층 학습

Learning Language-Specific Layers for Multilingual Machine Translation

초록

Support