ReasonMed：一個37萬筆由多智能體生成的數據集，用於推進醫療推理研究

摘要

尽管基于推理的大型语言模型（LLMs）在数学和编程领域表现出色，但它们在知识密集型医学问答中的能力仍未被充分探索。为此，我们引入了ReasonMed，这是最大的医学推理数据集，包含从各种LLMs生成的170万条初始推理路径中提炼出的37万条高质量示例。ReasonMed通过多智能体验证和精炼过程构建，其中我们设计了一个错误精炼器，通过识别和纠正由验证器标记的易错步骤来增强推理路径。利用ReasonMed，我们系统地研究了训练医学推理模型的最佳实践，发现将详细的思维链（CoT）推理与简洁的答案摘要相结合，能产生最有效的微调策略。基于这一策略，我们训练了ReasonMed-7B，它为10B以下模型设立了新的基准，比之前的最佳模型高出4.17%，甚至在PubMedQA上超过了LLaMA3.1-70B，提升了4.60%。

English

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

ReasonMed：一個37萬筆由多智能體生成的數據集，用於推進醫療推理研究

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

摘要

Support