CURE-Med:基于课程式强化学习的多语言医学推理
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
January 19, 2026
作者: Eric Onyame, Akash Ghosh, Subhadip Baidya, Sriparna Saha, Xiuying Chen, Chirag Agarwal
cs.AI
摘要
尽管大型语言模型在单语数学推理和常识推理方面表现出色,但在多语言医疗推理应用中仍不可靠,这阻碍了其在多语言医疗场景中的部署。为解决这一问题,我们首先推出CUREMED-BENCH——一个高质量的多语言医疗推理数据集,包含具有单一可验证答案的开放式推理查询,涵盖十三种语言(包括阿姆哈拉语、约鲁巴语和斯瓦希里语等使用人数较少的语言)。基于该数据集,我们提出CUREMED框架,这是一种融入课程学习理念的强化学习框架,通过整合语码转换感知的监督微调和群体相对策略优化,共同提升逻辑正确性与语言稳定性。在十三种语言的测试中,我们的方法始终优于强基线模型且具备良好扩展性:在70亿参数规模下实现85.21%的语言一致性与54.35%的逻辑正确率,在320亿参数规模下达到94.96%的语言一致性与70.04%的逻辑正确率。这些成果为大型语言模型实现可靠且公平的多语言医疗推理提供了支持。代码与数据集详见https://cure-med.github.io/。
English
While large language models (LLMs) have shown to perform well on monolingual mathematical and commonsense reasoning, they remain unreliable for multilingual medical reasoning applications, hindering their deployment in multilingual healthcare settings. We address this by first introducing CUREMED-BENCH, a high-quality multilingual medical reasoning dataset with open-ended reasoning queries with a single verifiable answer, spanning thirteen languages, including underrepresented languages such as Amharic, Yoruba, and Swahili. Building on this dataset, we propose CURE-MED, a curriculum-informed reinforcement learning framework that integrates code-switching-aware supervised fine-tuning and Group Relative Policy Optimization to jointly improve logical correctness and language stability. Across thirteen languages, our approach consistently outperforms strong baselines and scales effectively, achieving 85.21% language consistency and 54.35% logical correctness at 7B parameters, and 94.96% language consistency and 70.04% logical correctness at 32B parameters. These results support reliable and equitable multilingual medical reasoning in LLMs. The code and dataset are available at https://cure-med.github.io/