強化学習による未学習言語翻訳の文脈学習の誘発

要旨

従来の研究では、大規模言語モデル（LLM）が継続学習や学習コンテキストへの文法書の符号化により、未見言語や低リソース言語の翻訳が可能であることが示されている。しかし、いずれの手法も特定の言語に過適合しがちであり、テスト時におけるゼロショット転移は限定的である。低リソース言語を大規模に翻訳するためには、LLMは特定の言語を記憶するのではなく、コンテキスト内の言語知識を活用するメタスキルを獲得する必要があると我々は主張する。本論文では、豊富な言語的コンテキストが与えられた状況下での未見言語翻訳に対して、表層的な翻訳指標（chrF）を報酬とする強化学習（RL）アプローチを提案する。実験的には、軽量な報酬にもかかわらず、RLで学習されたモデルは提供されたコンテキストから関連する言語情報を効果的に抽出・適用し、コンテキスト内学習や教師あり微調整と比較して、完全に未見の言語に対して優れた翻訳を実現する。我々の分析は、結果ベースのRLが数学やコーディングといった従来の推論タスクを超えて、コンテキストからの言語学習のレシピとして機能し得ることを示唆している。

English

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.