ChatPaper.aiChatPaper

ExTrans:基於範例增強強化學習的多語言深度推理翻譯

ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning

May 19, 2025
作者: Jiaan Wang, Fandong Meng, Jie Zhou
cs.AI

摘要

近年來,大型推理模型(LRMs)的出現,如OpenAI-o1與DeepSeek-R1,在處理複雜問題如數學與編碼方面展現了令人矚目的能力。一些先驅性研究嘗試將LRMs在神經機器翻譯(MT)領域的成功加以應用,透過強化學習(RL)構建具備深度推理MT能力的LRMs。儘管已取得一定進展,這些嘗試普遍集中於幾種高資源語言,如英語與中文,而對其他語言的表現尚不明確。此外,先前工作中的獎勵建模方法並未充分釋放強化學習在MT中的潛力。在本研究中,我們首先設計了一種新的獎勵建模方法,該方法將策略MT模型的翻譯結果與一個強大的LRM(即DeepSeek-R1-671B)進行比較,並量化這些比較以提供獎勵。實驗結果證明了該獎勵建模方法的優越性。以Qwen2.5-7B-Instruct為基礎,訓練後的模型在文學翻譯中達到了新的頂尖水平,並超越了包括OpenAI-o1與DeepSeek-R1在內的強大LRMs。進一步地,我們將該方法擴展至包含11種語言的多語種設定中。通過精心設計的輕量級RL獎勵建模,我們能夠簡單地將強大的MT能力從單一方向轉移至多個(即90個)翻譯方向,並實現了令人印象深刻的多語種MT性能。
English
In recent years, the emergence of large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, has shown impressive capabilities in complex problems, e.g., mathematics and coding. Some pioneering studies attempt to bring the success of LRMs in neural machine translation (MT). They try to build LRMs with deep reasoning MT ability via reinforcement learning (RL). Despite some progress that has been made, these attempts generally focus on several high-resource languages, e.g., English and Chinese, leaving the performance on other languages unclear. Besides, the reward modeling methods in previous work do not fully unleash the potential of reinforcement learning in MT. In this work, we first design a new reward modeling method that compares the translation results of the policy MT model with a strong LRM (i.e., DeepSeek-R1-671B), and quantifies the comparisons to provide rewards. Experimental results demonstrate the superiority of the reward modeling method. Using Qwen2.5-7B-Instruct as the backbone, the trained model achieves the new state-of-the-art performance in literary translation, and outperforms strong LRMs including OpenAI-o1 and DeepSeeK-R1. Furthermore, we extend our method to the multilingual settings with 11 languages. With a carefully designed lightweight reward modeling in RL, we can simply transfer the strong MT ability from a single direction into multiple (i.e., 90) translation directions and achieve impressive multilingual MT performance.

Summary

AI-Generated Summary

PDF32May 20, 2025