语法推理:合成语言推理轨迹能否提升低资源机器翻译?
Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?
June 2, 2026
作者: Renhao Pei, Yihong Liu, Sampo Pyysalo, Hinrich Schütze, Shaoxiong Ji
cs.AI
摘要
大语言模型(LLMs)通过上下文学习整合语言资源,为极低资源语言的机器翻译(MT)提供了富有前景的途径。然而,LLMs在翻译过程中往往难以有效应用语法信息。受思维链推理最新进展的启发,我们研究了低资源机器翻译能否从结构化的语言分析与语法推理中间步骤中获益。我们提出了一套自动化的流程,能从通用依存关系树库、词典和语法规则库中逐步生成语言推理轨迹。我们以锡伯语和昌唐语为测试案例,在三种设置下评估了这些轨迹:上下文学习(ICL)、监督微调(SFT)和强化微调(RFT)。结果表明,语言推理轨迹作为推理阶段的指导最为有效:在ICL中,可靠的句子特定轨迹在大多数模型、语言和评估指标上显著提升了翻译性能。相比之下,将语言推理轨迹作为训练数据使用,带来的提升较小且不够稳定——模型虽然学会了轨迹格式,但生成的推理内容常常存在错误。这些发现表明,当LLMs获得可靠的语言分析时,它们能够利用语法信息进行低资源机器翻译,而学习生成此类分析仍然是主要的瓶颈。
English
Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.