FuxiMT: 中国語中心の多言語機械翻訳のための大規模言語モデルのスパース化

要旨

本論文では、スパース化された大規模言語モデル（LLM）を基盤とした、中国語中心の多言語機械翻訳モデル「FuxiMT」を提案する。FuxiMTの学習には2段階の戦略を採用しており、まず大規模な中国語コーパスで事前学習を行い、その後65言語を含む大規模な並列データセットで多言語ファインチューニングを実施する。FuxiMTはMixture-of-Experts（MoEs）を組み込み、カリキュラム学習戦略を採用することで、さまざまなリソースレベルにおいて堅牢な性能を発揮する。実験結果から、FuxiMTは最先端のLLMや機械翻訳モデルを含む強力なベースラインを大幅に上回り、特に低リソース環境下で優れた性能を示すことが確認された。さらに、FuxiMTは未見の言語ペアに対するゼロショット翻訳能力も顕著であり、並列データが不足または存在しない状況でのコミュニケーションギャップを埋める可能性を示唆している。

English

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

FuxiMT: 中国語中心の多言語機械翻訳のための大規模言語モデルのスパース化

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

要旨

Support