對比偏好優化：推動大型語言模型在機器翻譯中的性能界限

摘要

中等大小的大型語言模型（LLMs）——具有 7B 或 13B 參數——展現出有希望的機器翻譯（MT）表現。然而，即使是表現最佳的 13B 基於 LLM 的翻譯模型，如 ALMA，也無法與最先進的傳統編碼器-解碼器翻譯模型或諸如 GPT-4 的更大規模的 LLMs 的表現相匹敵。在這項研究中，我們彌合了這種表現差距。我們首先評估了監督微調對於 LLMs 在 MT 任務中的不足之處，強調了參考數據中存在的質量問題，儘管這些數據是由人類生成的。接著，與模仿參考翻譯的 SFT 相反，我們引入了對比偏好優化（CPO），這是一種新穎的方法，訓練模型避免生成足夠但不完美的翻譯。將 CPO 應用於僅具有 22K 平行句子和 12M 參數的 ALMA 模型中，取得了顯著的改進。產生的模型，稱為 ALMA-R，可以匹敵或超越 WMT 競賽的獲勝者和 GPT-4 在 WMT'21、WMT'22 和 WMT'23 測試數據集上的表現。

English

Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

對比偏好優化：推動大型語言模型在機器翻譯中的性能界限

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

摘要

Support