ChatPaper.aiChatPaper

伏羲MT:面向中文中心的多語言機器翻譯之大規模語言模型稀疏化

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

May 20, 2025
作者: Shaolin Zhu, Tianyu Dong, Bo Li, Deyi Xiong
cs.AI

摘要

本文提出了一種新型的以中文為核心的多語言機器翻譯模型——FuxiMT,該模型基於稀疏化的大型語言模型(LLM)驅動。我們採用兩階段策略來訓練FuxiMT:首先在大量中文語料上進行預訓練,然後在包含65種語言的大規模平行數據集上進行多語言微調。FuxiMT整合了專家混合(MoEs)機制,並採用課程學習策略以確保在不同資源條件下的穩健性能。實驗結果表明,FuxiMT顯著超越了包括最先進的LLM和機器翻譯模型在內的強基線,特別是在低資源場景下表現尤為突出。此外,FuxiMT展現出對未見語言對的卓越零樣本翻譯能力,這表明其在平行數據稀缺或缺失的情況下,具有彌合溝通鴻溝的潛力。
English
In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

Summary

AI-Generated Summary

PDF12May 26, 2025