ChatPaper.aiChatPaper

在多語言推理模型上推進語言混合的思維鏈方法

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

October 5, 2025
作者: Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Amit Agarwal, Hyunwoo Ko, Chanuk Lim, Srikant Panda, Minhyuk Kim, Nikunj Drolia, Dasol Choi, Kyong-Ha Lee, Youngjae Yu
cs.AI

摘要

近期前沿模型采用长链思维推理来探索上下文中的解决方案空间,从而实现更强的性能。尽管许多研究致力于通过蒸馏构建更小但能力不减的模型,但大多聚焦于英语,对于特定语言的推理知之甚少。为填补这一空白,我们首先引入了**语言混合链式思维(Language-Mixed CoT)**,这是一种在英语与目标语言间切换的推理框架,利用英语作为锚点以优化推理过程,同时最小化翻译带来的误差。以韩语为例,我们精心构建了**Yi-Sang**数据集:包含从网络问答、考试、STEM及代码中收集的579万条原生韩语提示;由Qwen3-32B生成的370万条长推理轨迹;以及一个针对性的26万条高价值子集。我们在六个模型系列(Qwen2.5、Llama-3.1、Gemma-3等)上训练了九种模型(4B至35B)。其中,最佳模型**KO-REAson-35B**实现了顶尖性能,整体平均得分最高(64.0 ± 25),在9个基准测试中5个排名第一,其余位列第二。中小型模型也显著受益,在评估的九个基准上平均提升了18.6分。消融实验表明,**语言混合链式思维**比单语链式思维更为有效,同时带来了跨语言和多模态性能的提升。我们公开了数据整理流程、评估系统、数据集及模型,以推动特定语言推理研究的进步。数据与模型集合请访问:https://huggingface.co/KOREAson。
English
Recent frontier models employ long chain-of-thought reasoning to explore solution spaces in context and achieve stonger performance. While many works study distillation to build smaller yet capable models, most focus on English and little is known about language-specific reasoning. To bridge this gap, we first introduct **Language-Mixed CoT**, a reasoning schema that switches between English and a target language, using English as an anchor to excel in reasoning while minimizing translation artificats. As a Korean case study, we curate **Yi-Sang**: 5.79M native-Korean prompts from web Q&A, exams, STEM, and code; 3.7M long reasoning traces generated from Qwen3-32B; and a targeted 260k high-yield subset. We train ninve models (4B-35B) across six families (Qwen2.5, Llama-3.1, Gemma-3, etc). Our best model, **KO-REAson-35B**, achieves state-of-the-art performance, with the highest overall average score (64.0 \pm 25), ranking first on 5/9 benchmarks and second on the remainder. Samller and mid-sized models also benefit substantially, with an average improvement of +18.6 points across teh evaluated nine benchmarks. Ablations show **Language-Mixed CoT** is more effective than monolingual CoT, also resulting in cross-lingual and mult-modal performance gains. We release our data-curation pipeline, evaluation system, datasets, and models to advance research on language-specific reasoning. Data and model collection: https://huggingface.co/KOREAson.
PDF222October 9, 2025