ExTrans:基于范例增强强化学习的多语言深度推理翻译
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning
May 19, 2025
作者: Jiaan Wang, Fandong Meng, Jie Zhou
cs.AI
摘要
近年来,大型推理模型(LRMs)如OpenAI-o1和DeepSeek-R1的出现,在解决复杂问题(如数学和编程)方面展现了令人瞩目的能力。一些开创性研究尝试将LRMs的成功应用于神经机器翻译(MT)领域,通过强化学习(RL)构建具备深度推理能力的MT模型。尽管已取得一定进展,但这些尝试主要集中于几种高资源语言(如英语和汉语),对其他语言的表现尚不明确。此外,先前工作中的奖励建模方法未能充分发挥强化学习在MT中的潜力。在本研究中,我们首先设计了一种新的奖励建模方法,该方法将策略MT模型的翻译结果与强大的LRM(即DeepSeek-R1-671B)进行比较,并将比较结果量化以提供奖励。实验结果表明,该奖励建模方法具有显著优势。以Qwen2.5-7B-Instruct为骨干,训练后的模型在文学翻译中达到了新的最先进水平,并超越了包括OpenAI-o1和DeepSeek-R1在内的强大LRMs。进一步,我们将该方法扩展至包含11种语言的多语言场景。通过精心设计的轻量级RL奖励建模,我们能够将强大的MT能力从单一方向简单迁移至多个(即90个)翻译方向,实现了令人印象深刻的多语言MT性能。
English
In recent years, the emergence of large reasoning models (LRMs), such as
OpenAI-o1 and DeepSeek-R1, has shown impressive capabilities in complex
problems, e.g., mathematics and coding. Some pioneering studies attempt to
bring the success of LRMs in neural machine translation (MT). They try to build
LRMs with deep reasoning MT ability via reinforcement learning (RL). Despite
some progress that has been made, these attempts generally focus on several
high-resource languages, e.g., English and Chinese, leaving the performance on
other languages unclear. Besides, the reward modeling methods in previous work
do not fully unleash the potential of reinforcement learning in MT. In this
work, we first design a new reward modeling method that compares the
translation results of the policy MT model with a strong LRM (i.e.,
DeepSeek-R1-671B), and quantifies the comparisons to provide rewards.
Experimental results demonstrate the superiority of the reward modeling method.
Using Qwen2.5-7B-Instruct as the backbone, the trained model achieves the new
state-of-the-art performance in literary translation, and outperforms strong
LRMs including OpenAI-o1 and DeepSeeK-R1. Furthermore, we extend our method to
the multilingual settings with 11 languages. With a carefully designed
lightweight reward modeling in RL, we can simply transfer the strong MT ability
from a single direction into multiple (i.e., 90) translation directions and
achieve impressive multilingual MT performance.Summary
AI-Generated Summary