ChatPaper.aiChatPaper

偏好對齊是否總是增強基於LLM的翻譯的最佳選擇?一項實證分析

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

September 30, 2024
作者: Hippolyte Gisserot-Boukhlef, Ricardo Rei, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo, Nuno M. Guerreiro
cs.AI

摘要

機器翻譯(MT)評估中神經度量標準因其與人類判斷的優越相關性而日益突出,相較於傳統詞彙度量標準。研究人員因此通過質量資訊解碼策略利用神經度量標準,取得比基於可能性的方法更好的結果。隨著大型語言模型(LLMs)的興起,基於偏好的對齊技術因其通過質量估算器誘導的偏好直接優化模型權重以增強翻譯質量而受到關注。本研究聚焦於對比偏好優化(CPO),並進行廣泛實驗以評估基於偏好的對齊對翻譯質量的影響。我們的研究結果顯示,雖然在高質量數據上,CPO在對齊度量方面一貫優於監督微調(SFT),但可能導致在下游評估度量之間的不穩定性,尤其是神經和詞彙度量標準之間。此外,我們證明僅依賴基本模型生成候選翻譯可達到與使用多個外部系統相當的性能,同時確保在下游度量方面更好的一致性。
English
Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics through quality-informed decoding strategies, achieving better results than likelihood-based methods. With the rise of Large Language Models (LLMs), preference-based alignment techniques have gained attention for their potential to enhance translation quality by optimizing model weights directly on preferences induced by quality estimators. This study focuses on Contrastive Preference Optimization (CPO) and conducts extensive experiments to evaluate the impact of preference-based alignment on translation quality. Our findings indicate that while CPO consistently outperforms Supervised Fine-Tuning (SFT) on high-quality data with regard to the alignment metric, it may lead to instability across downstream evaluation metrics, particularly between neural and lexical ones. Additionally, we demonstrate that relying solely on the base model for generating candidate translations achieves performance comparable to using multiple external systems, while ensuring better consistency across downstream metrics.

Summary

AI-Generated Summary

PDF162November 16, 2024