自我提升的多语言长链条推理：翻译与推理融合训练法

摘要

长推理模型在多语言场景中常面临挑战：面对非英语问题时，它们倾向于用英语进行推理；若强制使用问题语言进行推理，准确率则会大幅下降。这种困境源于模型在多语言问题理解与多语言推理两方面的能力局限。为解决这两个问题，我们提出TRIT（翻译-推理联合训练）框架——一种通过整合翻译训练实现自我提升的多语言推理方法。在无需外部反馈或额外多语言数据的情况下，我们的方法能同步增强多语言问题理解与回答生成能力。在MMATH数据集上，该方法以平均7个百分点的优势超越多个基线模型，同时提升了答案准确性与语言一致性。进一步分析表明，翻译训练的整合使跨语言问题对齐效果提升超过10个百分点，并显著提升了数学问题及通用领域文本的翻译质量，在FLORES-200数据集上COMET指标最高提升达8.4分。

English

Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.

自我提升的多语言长链条推理：翻译与推理融合训练法

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

摘要

Support