ReasonMed：一个包含37万条多智能体生成数据的数据集，用于推进医疗推理研究

摘要

尽管基于推理的大语言模型（LLMs）在数学和编程领域表现出色，但它们在知识密集型医疗问答中的能力仍待深入探索。为此，我们推出了ReasonMed，这是目前最大的医疗推理数据集，包含37万条高质量示例，这些示例是从由多种LLMs生成的170万条初始推理路径中提炼而来。ReasonMed通过多智能体验证与精炼流程构建，其中我们设计了一个错误修正器，用于识别并纠正由验证器标记的易错步骤，从而提升推理路径的质量。借助ReasonMed，我们系统地研究了训练医疗推理模型的最佳实践，发现将详细的思维链（CoT）推理与简洁的答案摘要相结合，能实现最有效的微调策略。基于此策略，我们训练了ReasonMed-7B模型，该模型为10B以下模型设立了新标杆，较之前最佳模型提升了4.17%，甚至在PubMedQA上超越了LLaMA3.1-70B，领先幅度达4.60%。

English

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

ReasonMed：一个包含37万条多智能体生成数据的数据集，用于推进医疗推理研究

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

摘要

Support