ChatPaper.aiChatPaper

語法推理:合成語言推理軌跡能否提升低資源機器翻譯?

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

June 2, 2026
作者: Renhao Pei, Yihong Liu, Sampo Pyysalo, Hinrich Schütze, Shaoxiong Ji
cs.AI

摘要

大语言模型(LLMs)透過上下文學習整合語言資源,為極低資源語言的機器翻譯(MT)提供了前景可期的方法。然而,LLMs 在翻譯過程中往往難以有效應用語法資訊。受近期思維鏈推理進展的啟發,我們探討了低資源機器翻譯是否能從結構化的語言分析與語法推理中間步驟中受益。我們提出一套流水線,可從通用依存關係樹庫、詞典及語法規則庫自動生成逐步的語言推理軌跡。我們以錫伯語和奇唐語為測試案例,在三種設定下評估這些軌跡:上下文學習(ICL)、監督微調(SFT)以及強化微調(RFT)。結果顯示,語言推理軌跡作為推理時的引導最為有效:在 ICL 中,可靠的句子特定軌跡顯著提升了大多數模型、語言及指標上的翻譯表現。相比之下,將語言推理軌跡作為訓練數據僅能帶來較小且不一致的增益,因為模型學會了軌跡格式,但經常生成錯誤內容。這些發現表明,當提供可靠語言分析時,LLMs 能利用語法資訊進行低資源機器翻譯,然而學習生成此類分析仍是一大瓶頸。
English
Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.