伏羲MT:面向中文中心的多语言机器翻译的大规模语言模型稀疏化
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
May 20, 2025
作者: Shaolin Zhu, Tianyu Dong, Bo Li, Deyi Xiong
cs.AI
摘要
本文提出了一种以中文为核心的多语言机器翻译模型——FuxiMT,该模型基于稀疏化的大型语言模型(LLM)构建。我们采用两阶段策略训练FuxiMT:首先在庞大的中文语料库上进行预训练,随后在涵盖65种语言的大规模平行数据集上进行多语言微调。FuxiMT集成了专家混合(MoEs)机制,并运用课程学习策略,以确保在不同资源条件下均能保持稳健性能。实验结果表明,FuxiMT显著超越了包括最先进的LLM和机器翻译模型在内的强基线,尤其在低资源场景下表现尤为突出。此外,FuxiMT对未见过的语言对展现出卓越的零样本翻译能力,表明其在平行数据稀缺或缺失情况下具有弥合沟通鸿沟的潜力。
English
In this paper, we present FuxiMT, a novel Chinese-centric multilingual
machine translation model powered by a sparsified large language model (LLM).
We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on
a massive Chinese corpus and then conduct multilingual fine-tuning on a large
parallel dataset encompassing 65 languages. FuxiMT incorporates
Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust
performance across various resource levels. Experimental results demonstrate
that FuxiMT significantly outperforms strong baselines, including
state-of-the-art LLMs and machine translation models, particularly under
low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot
translation capabilities for unseen language pairs, indicating its potential to
bridge communication gaps where parallel data are scarce or unavailable.Summary
AI-Generated Summary