機器翻譯的範式轉變:提升大型語言模型的翻譯效能
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
September 20, 2023
作者: Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla
cs.AI
摘要
生成式大型語言模型(LLM)在各種自然語言處理任務中取得了顯著進展。然而,在翻譯任務中,特別是對於具有中等模型大小(即7B或13B參數)的模型,這些進展並未得到體現,這些模型仍然落後於傳統監督式編碼器-解碼器翻譯模型。先前的研究試圖提高這些中等LLM的翻譯能力,但其收益有限。在本研究中,我們提出了一種新穎的LLM微調方法,專門為翻譯任務設計,消除了傳統翻譯模型通常依賴的豐富平行數據的需求。我們的方法包括兩個微調階段:首先在單語數據上進行初始微調,然後在一小組高質量平行數據上進行後續微調。我們將通過此策略開發的LLM稱為基於先進語言模型的翻譯器(ALMA)。基於我們的基礎模型LLaMA-2,我們的結果顯示,該模型在WMT'21(2個方向)和WMT'22(8個方向)測試數據集的10個翻譯方向上,相對於其零-shot表現,平均BLEU和COMET均可提高超過12。其性能顯著優於所有先前的工作,甚至優於NLLB-54B模型和GPT-3.5-text-davinci-003,而僅具有7B或13B參數。這種方法為機器翻譯中的一種新型訓練範式奠定了基礎。
English
Generative Large Language Models (LLMs) have achieved remarkable advancements
in various NLP tasks. However, these advances have not been reflected in the
translation task, especially those with moderate model sizes (i.e., 7B or 13B
parameters), which still lag behind conventional supervised encoder-decoder
translation models. Previous studies have attempted to improve the translation
capabilities of these moderate LLMs, but their gains have been limited. In this
study, we propose a novel fine-tuning approach for LLMs that is specifically
designed for the translation task, eliminating the need for the abundant
parallel data that traditional translation models usually depend on. Our
approach consists of two fine-tuning stages: initial fine-tuning on monolingual
data followed by subsequent fine-tuning on a small set of high-quality parallel
data. We introduce the LLM developed through this strategy as Advanced Language
Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our
results show that the model can achieve an average improvement of more than 12
BLEU and 12 COMET over its zero-shot performance across 10 translation
directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test
datasets. The performance is significantly better than all prior work and even
superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or
13B parameters. This method establishes the foundation for a novel training
paradigm in machine translation.