対照的選好最適化：機械翻訳におけるLLM性能の限界を押し広げる

要旨

中規模の大規模言語モデル（LLM）――7Bや13Bパラメータを持つモデル――は、機械翻訳（MT）において有望な性能を示す。しかし、ALMAのようなトップクラスの13B LLMベースの翻訳モデルでさえ、最先端の従来型エンコーダ-デコーダ翻訳モデルやGPT-4のような大規模LLMの性能には及ばない。本研究では、この性能差を埋める。まず、MTタスクにおけるLLMの教師ありファインチューニングの欠点を評価し、人間が生成したものであっても参照データに存在する品質問題を強調する。次に、参照翻訳を模倣するSFTとは対照的に、適切ではあるが完璧ではない翻訳を生成しないようにモデルを訓練する新しいアプローチであるContrastive Preference Optimization（CPO）を導入する。わずか22Kの並列文と12Mパラメータを持つALMAモデルにCPOを適用することで、大幅な改善が得られる。その結果得られたモデル、ALMA-Rは、WMT'21、WMT'22、WMT'23のテストデータセットにおいて、WMTコンペティションの優勝者やGPT-4の性能に匹敵またはそれを上回る性能を発揮する。

English

Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

対照的選好最適化：機械翻訳におけるLLM性能の限界を押し広げる

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

要旨

Support