Med-PRM：基于逐步验证与指南指导的医疗推理模型及其过程奖励机制

摘要

大型语言模型在临床决策中展现出潜力，但现有方法难以定位并纠正推理过程中特定步骤的错误。这一局限在医学领域尤为关键，因为识别并解决推理错误对于准确诊断和有效患者护理至关重要。我们提出了Med-PRM，一种过程奖励建模框架，它利用检索增强生成技术，将每个推理步骤与既定的医学知识库进行验证。通过从临床指南和文献中检索证据来验证中间推理步骤，我们的模型能够以细粒度方式精确评估推理质量。在五个医学问答基准和两个开放式诊断任务上的评估表明，Med-PRM实现了最先进的性能，使用Med-PRM将基础模型的性能提升了高达13.50%。此外，我们通过以即插即用的方式将Med-PRM与强大的策略模型（如Meerkat）集成，首次在MedQA上使用80亿参数的小规模模型实现了超过80%的准确率。我们的代码和数据可在以下网址获取：https://med-prm.github.io/

English

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80\% accuracy on MedQA for the first time using small-scale models of 8 billion parameters. Our code and data are available at: https://med-prm.github.io/

Med-PRM：基于逐步验证与指南指导的医疗推理模型及其过程奖励机制

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

摘要

Support