Med-PRM：基于逐步指南验证过程奖励的医疗推理模型

摘要

大型語言模型在臨床決策中展現出潛力，但現有方法在定位和糾正推理過程特定步驟中的錯誤方面仍存在困難。這一限制在醫學領域尤為關鍵，因為識別並解決推理錯誤對於準確診斷和有效患者護理至關重要。我們提出了Med-PRM，這是一種過程獎勵建模框架，它利用檢索增強生成技術，針對已建立的醫學知識庫驗證每個推理步驟。通過從臨床指南和文獻中檢索證據來驗證中間推理步驟，我們的模型能夠以細粒度方式精確評估推理質量。在五個醫學問答基準和兩項開放式診斷任務上的評估表明，Med-PRM達到了最先進的性能，使用Med-PRM將基礎模型的性能提升了高達13.50%。此外，我們通過以即插即用的方式將Med-PRM與Meerkat等強大策略模型集成，展示了Med-PRM的通用性，首次在MedQA上使用80億參數的小規模模型實現了超過80%的準確率。我們的代碼和數據可在以下網址獲取：https://med-prm.github.io/

English

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80\% accuracy on MedQA for the first time using small-scale models of 8 billion parameters. Our code and data are available at: https://med-prm.github.io/

Med-PRM：基于逐步指南验证过程奖励的医疗推理模型

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

摘要

Support