对比偏好优化：推动LLM在机器翻译中性能的边界

摘要

具有70亿或130亿参数的中等规模大型语言模型（LLMs）展现出有希望的机器翻译（MT）性能。然而，即使是表现最佳的130亿参数的LLM翻译模型，如ALMA，也无法与最先进的传统编码器-解码器翻译模型或更大规模的LLMs，如GPT-4，相提并论。在这项研究中，我们弥合了这种性能差距。我们首先评估了LLMs在MT任务中的有监督微调的缺点，强调了参考数据中存在的质量问题，尽管这些数据是人类生成的。然后，与模仿参考翻译的SFT相反，我们引入了对比偏好优化（CPO），这是一种训练模型避免生成足够但不完美翻译的新方法。将CPO应用于仅具有22K平行句子和1200万参数的ALMA模型，取得了显著的改进。由此产生的模型，称为ALMA-R，可以匹敌或超越WMT比赛获胜者和GPT-4在WMT'21、WMT'22和WMT'23测试数据集上的表现。

English

Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

对比偏好优化：推动LLM在机器翻译中性能的边界

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

摘要

Support