문법에 대한 추론: 합성 언어 추론 궤적이 저자원 기계 번역을 향상시킬 수 있는가?

초록

대규모 언어 모델(LLM)은 문맥 내 학습(in-context learning)을 통해 언어적 자원을 통합함으로써 극소수 자원 언어에 대한 기계 번역(MT)에 유망한 접근 방식을 제공한다. 그러나 LLM은 종종 번역 과정에서 문법적 정보를 효과적으로 적용하는 데 어려움을 겪는다. 최근 연쇄 추론(chain-of-thought reasoning)의 진전에 영감을 받아, 우리는 저자원 언어 번역이 구조화된 중간 단계의 언어 분석 및 문법 추론의 이점을 얻을 수 있는지 조사한다. 우리는 Universal Dependencies 트리뱅크, 사전 및 문법 규칙 뱅크로부터 단계별 언어 추론 궤적을 자동으로 생성하는 파이프라인을 제안한다. 이 궤적을 시베어와 친탕어를 테스트 사례로 하여 문맥 내 학습(ICL), 지도 미세 조정(SFT), 강화 미세 조정(RFT)의 세 가지 설정에서 평가한다. 결과는 언어 추론 궤적이 추론 시점의 지침(inference-time guidance)으로 가장 효과적임을 보여준다: ICL에서 신뢰할 수 있는 문장별 궤적은 대부분의 모델, 언어 및 지표에서 번역 성능을 크게 향상시킨다. 반면, 언어 추론 궤적을 훈련 데이터로 사용할 경우 모델이 궤적 형식은 학습하지만 종종 오류가 포함된 내용을 생성함에 따라 개선 폭이 작고 일관성이 낮다. 이러한 결과는 LLM이 신뢰할 수 있는 언어 분석이 주어질 때 저자원 번역에 문법 정보를 활용할 수 있지만, 그러한 분석을 생성하는 방법을 학습하는 것은 여전히 주요 병목 현상임을 시사한다.

English

Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.