강화 학습은 미지의 언어 번역에 대한 맥락 학습을 유도한다.

초록

이전 연구들은 대규모 언어 모델(LLM)이 지속적 학습을 거치거나 문법책을 문맥에 인코딩하는 방식으로 미경험 언어나 저자원 언어를 번역할 수 있음을 보여주었다. 그러나 두 방법 모두 특정 언어에 과적합되는 경향이 있으며, 테스트 시점에서 제로샷 전이 능력이 제한적이다. 극히 저자원 언어를 대규모로 번역하기 위해, 우리는 LLM이 특정 언어를 암기하기보다는 문맥 내 언어적 지식을 활용하는 메타 기술을 습득해야 한다고 주장한다. 본 논문에서는 풍부한 언어적 문맥이 주어졌을 때 미경험 언어 번역을 위한 강화학습(RL) 접근법을 제안하며, 표면 수준의 번역 평가 지표(chrF)를 보상으로 사용한다. 실험적으로, 가벼운 보상에도 불구하고 RL로 학습된 모델은 제공된 문맥에서 관련 언어 정보를 효과적으로 추출하고 적용하여, 완전히 새로운 언어에 대해 문맥 내 학습이나 지도 미세 조정보다 더 나은 번역을 달성한다. 우리의 분석은 결과 기반 RL이 수학이나 코딩과 같은 전통적인 추론 작업을 넘어, 문맥으로부터 언어를 학습하는 방법론으로 확장될 수 있음을 시사한다.

English

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.