CURE-Med：基于课程式强化学习的多语言医学推理

摘要

尽管大语言模型在单语数学推理和常识推理任务中表现出色，但在多语言医疗推理应用中仍不可靠，这阻碍了其在多语言医疗场景中的部署。为解决这一问题，我们首先推出了CUREMED-BENCH——一个高质量的多语言医疗推理数据集，包含具有单一可验证答案的开放式推理查询，涵盖十三种语言（包括阿姆哈拉语、约鲁巴语和斯瓦希里语等使用人数较少的语种）。基于该数据集，我们提出CUREMED框架，该框架采用课程式强化学习策略，通过集成语码转换感知的监督微调和群体相对策略优化，共同提升逻辑准确性与语言稳定性。在十三种语言的测试中，我们的方法始终优于强基线模型且具备良好的扩展性：70亿参数规模下实现85.21%的语言一致性与54.35%的逻辑正确率，320亿参数规模下更达到94.96%的语言一致性与70.04%的逻辑正确率。这些成果为大语言模型实现可靠、公平的多语言医疗推理提供了支撑。代码与数据集已发布于https://cure-med.github.io/。

English

While large language models (LLMs) have shown to perform well on monolingual mathematical and commonsense reasoning, they remain unreliable for multilingual medical reasoning applications, hindering their deployment in multilingual healthcare settings. We address this by first introducing CUREMED-BENCH, a high-quality multilingual medical reasoning dataset with open-ended reasoning queries with a single verifiable answer, spanning thirteen languages, including underrepresented languages such as Amharic, Yoruba, and Swahili. Building on this dataset, we propose CURE-MED, a curriculum-informed reinforcement learning framework that integrates code-switching-aware supervised fine-tuning and Group Relative Policy Optimization to jointly improve logical correctness and language stability. Across thirteen languages, our approach consistently outperforms strong baselines and scales effectively, achieving 85.21% language consistency and 54.35% logical correctness at 7B parameters, and 94.96% language consistency and 70.04% logical correctness at 32B parameters. These results support reliable and equitable multilingual medical reasoning in LLMs. The code and dataset are available at https://cure-med.github.io/

CURE-Med：基于课程式强化学习的多语言医学推理

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

摘要

Support