强化学习促使对未见语言翻译的上下文学习

摘要

先前研究表明，大语言模型（LLMs）可通过持续训练乃至在上下文中编码语法书的方式，实现未见过语言或低资源语言的翻译。然而，这两种方法通常对特定语言过度拟合，在测试时零样本迁移能力有限。为实现对极低资源语言的大规模翻译，我们认为LLMs必须掌握利用上下文语言知识的元技能，而非单纯记忆特定语言。本文提出一种基于强化学习（RL）的方法，在提供丰富语言上下文的前提下进行未见过语言翻译，以表层翻译指标（chrF）作为奖励信号。实验表明，尽管奖励函数较为轻量，经RL训练的模型能有效从给定上下文中提取并应用相关语言信息，在对完全未见语言进行翻译时表现优于上下文学习或有监督微调方法。我们的分析表明，基于结果的强化学习可超越数学、编程等传统推理任务范畴，成为从上下文中学习语言的有效范式。

English

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.