FuxiMT: 중국 중심 다국어 기계 번역을 위한 대규모 언어 모델 희소화

초록

본 논문에서는 희소화된 대규모 언어 모델(LLM)을 기반으로 한 중국어 중심의 다국어 기계 번역 모델인 FuxiMT를 소개한다. FuxiMT의 학습을 위해 두 단계 전략을 채택하였다. 먼저 대규모 중국어 코퍼스로 모델을 사전 학습한 후, 65개 언어를 포함하는 대규모 병렬 데이터셋을 통해 다국어 미세 조정을 수행하였다. FuxiMT는 Mixture-of-Experts(MoEs)를 통합하고 다양한 자원 수준에서 견고한 성능을 보장하기 위해 커리큘럼 학습 전략을 사용한다. 실험 결과, FuxiMT는 특히 저자원 시나리오에서 최신 LLM 및 기계 번역 모델을 포함한 강력한 베이스라인을 크게 능가하는 것으로 나타났다. 또한 FuxiMT는 보이지 않는 언어 쌍에 대해 놀라운 제로샷 번역 능력을 보여주어, 병렬 데이터가 부족하거나 없는 상황에서도 커뮤니케이션 격차를 해소할 수 있는 잠재력을 보여준다.

English

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

FuxiMT: 중국 중심 다국어 기계 번역을 위한 대규모 언어 모델 희소화

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

초록

Support