機械翻訳におけるパラダイムシフト：大規模言語モデルの翻訳性能向上

要旨

生成型大規模言語モデル（LLM）は、様々な自然言語処理タスクにおいて顕著な進歩を遂げてきた。しかし、これらの進歩は翻訳タスク、特に中規模モデルサイズ（7Bまたは13Bパラメータ）においては反映されておらず、従来の教師ありエンコーダ-デコーダ翻訳モデルに依然として遅れを取っている。これまでの研究では、これらの中型LLMの翻訳能力を向上させようと試みられてきたが、その成果は限定的であった。本研究では、翻訳タスクに特化した新しいファインチューニング手法を提案し、従来の翻訳モデルが依存する大量の並列データを必要としないアプローチを実現する。我々のアプローチは、単一言語データでの初期ファインチューニングと、少量の高品質な並列データでの後続ファインチューニングという2段階のファインチューニングから構成される。この戦略を通じて開発されたLLMを、Advanced Language Model-based trAnslator（ALMA）として紹介する。基盤モデルとしてLLaMA-2を使用した結果、WMT'21（2方向）およびWMT'22（8方向）のテストデータセットにおける10の翻訳方向において、ゼロショット性能から平均12 BLEUおよび12 COMET以上の改善を達成できることが示された。この性能は、これまでのすべての研究を大幅に上回り、7Bまたは13Bパラメータのみでありながら、NLLB-54BモデルやGPT-3.5-text-davinci-003をも凌駕するものである。この手法は、機械翻訳における新しいトレーニングパラダイムの基盤を確立するものである。

English

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

機械翻訳におけるパラダイムシフト：大規模言語モデルの翻訳性能向上

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

要旨

Support