伏羲MT：面向中文中心的多语言机器翻译的大规模语言模型稀疏化

摘要

本文提出了一种以中文为核心的多语言机器翻译模型——FuxiMT，该模型基于稀疏化的大型语言模型（LLM）构建。我们采用两阶段策略训练FuxiMT：首先在庞大的中文语料库上进行预训练，随后在涵盖65种语言的大规模平行数据集上进行多语言微调。FuxiMT集成了专家混合（MoEs）机制，并运用课程学习策略，以确保在不同资源条件下均能保持稳健性能。实验结果表明，FuxiMT显著超越了包括最先进的LLM和机器翻译模型在内的强基线，尤其在低资源场景下表现尤为突出。此外，FuxiMT对未见过的语言对展现出卓越的零样本翻译能力，表明其在平行数据稀缺或缺失情况下具有弥合沟通鸿沟的潜力。

English

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

伏羲MT：面向中文中心的多语言机器翻译的大规模语言模型稀疏化

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

摘要

Support