強化學習引發未知語言翻譯的語境學習
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
June 4, 2026
作者: Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich
cs.AI
摘要
先前研究表明,大型语言模型(LLMs)可通过持续训练或在其上下文中编码语法书的方式,翻译未见语言或低资源语言。然而,这两种方法通常过度拟合特定语言,在测试时仅能实现有限的零样本迁移。为大规模翻译极低资源语言,我们认为LLMs必须掌握利用上下文语言知识的元技能,而非记忆特定语言。本文提出一种基于强化学习(RL)的方法,在丰富的语言上下文条件下进行未见语言翻译,并以表层翻译指标(chrF)作为奖励。实验结果表明,尽管奖励机制轻量,经RL训练的模型仍能有效提取并应用上下文中的相关语言信息,相较于上下文学习或有监督微调,其在完全未见语言上的翻译质量更优。我们的分析表明,基于结果的强化学习可超越数学、编程等传统推理任务,成为从上下文中学习语言的有效方法。
English
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.