ChatPaper.aiChatPaper

ReasonMed:一个包含37万条多智能体生成数据的数据集,用于推进医疗推理研究

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

June 11, 2025
作者: Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Yu Rong, Wenbing Huang, Qifeng Bai, Tingyang Xu
cs.AI

摘要

尽管基于推理的大语言模型(LLMs)在数学和编程领域表现出色,但它们在知识密集型医疗问答中的能力仍待深入探索。为此,我们推出了ReasonMed,这是目前最大的医疗推理数据集,包含37万条高质量示例,这些示例是从由多种LLMs生成的170万条初始推理路径中提炼而来。ReasonMed通过多智能体验证与精炼流程构建,其中我们设计了一个错误修正器,用于识别并纠正由验证器标记的易错步骤,从而提升推理路径的质量。借助ReasonMed,我们系统地研究了训练医疗推理模型的最佳实践,发现将详细的思维链(CoT)推理与简洁的答案摘要相结合,能实现最有效的微调策略。基于此策略,我们训练了ReasonMed-7B模型,该模型为10B以下模型设立了新标杆,较之前最佳模型提升了4.17%,甚至在PubMedQA上超越了LLaMA3.1-70B,领先幅度达4.60%。
English
Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.
PDF713June 13, 2025