번역-추론 통합 훈련을 통한 다국어 장거리 추론의 자기 진화

초록

긴 추론 모델은 다국어 환경에서 종종 어려움을 겪습니다. 비영어 질문에 대해 영어로 추론하는 경향이 있으며, 질문 언어로 추론하도록 제약할 경우 정확도가 현저히 떨어집니다. 이러한 어려움은 다국어 질문 이해와 다국어 추론 모두에 대한 제한된 능력에서 비롯됩니다. 두 문제를 동시에 해결하기 위해 우리는 번역 훈련을 다국어 추론에 통합하는 자가 향상 프레임워크인 TRIT(번역-추론 통합 훈련)를 제안합니다. 외부 피드백이나 추가 다국어 데이터 없이도 우리의 방법은 다국어 질문 이해와 응답 생성을 함께 향상시킵니다. MMATH에서 우리 방법은 여러 기준선을 평균 7%p 앞지르며 답변의 정확성과 언어 일관성을 모두 개선했습니다. 추가 분석 결과, 번역 훈련 통합은 교차 언어 질문 정렬을 10%p 이상 향상시키고 수학 질문 및 일반 영역 텍스트의 번역 품질을 FLORES-200에서 최대 8.4 COMET 점수까지 향상시키는 것으로 나타났습니다.

English

Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.

번역-추론 통합 훈련을 통한 다국어 장거리 추론의 자기 진화

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

초록

Support