Med-PRM:基于逐步指南验证过程奖励的医疗推理模型
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
June 13, 2025
作者: Jaehoon Yun, Jiwoong Sohn, Jungwoo Park, Hyunjae Kim, Xiangru Tang, Yanjun Shao, Yonghoe Koo, Minhyeok Ko, Qingyu Chen, Mark Gerstein, Michael Moor, Jaewoo Kang
cs.AI
摘要
大型語言模型在臨床決策中展現出潛力,但現有方法在定位和糾正推理過程特定步驟中的錯誤方面仍存在困難。這一限制在醫學領域尤為關鍵,因為識別並解決推理錯誤對於準確診斷和有效患者護理至關重要。我們提出了Med-PRM,這是一種過程獎勵建模框架,它利用檢索增強生成技術,針對已建立的醫學知識庫驗證每個推理步驟。通過從臨床指南和文獻中檢索證據來驗證中間推理步驟,我們的模型能夠以細粒度方式精確評估推理質量。在五個醫學問答基準和兩項開放式診斷任務上的評估表明,Med-PRM達到了最先進的性能,使用Med-PRM將基礎模型的性能提升了高達13.50%。此外,我們通過以即插即用的方式將Med-PRM與Meerkat等強大策略模型集成,展示了Med-PRM的通用性,首次在MedQA上使用80億參數的小規模模型實現了超過80%的準確率。我們的代碼和數據可在以下網址獲取:https://med-prm.github.io/
English
Large language models have shown promise in clinical decision making, but
current approaches struggle to localize and correct errors at specific steps of
the reasoning process. This limitation is critical in medicine, where
identifying and addressing reasoning errors is essential for accurate diagnosis
and effective patient care. We introduce Med-PRM, a process reward modeling
framework that leverages retrieval-augmented generation to verify each
reasoning step against established medical knowledge bases. By verifying
intermediate reasoning steps with evidence retrieved from clinical guidelines
and literature, our model can precisely assess the reasoning quality in a
fine-grained manner. Evaluations on five medical QA benchmarks and two
open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art
performance, with improving the performance of base models by up to 13.50%
using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by
integrating it in a plug-and-play fashion with strong policy models such as
Meerkat, achieving over 80\% accuracy on MedQA for the first time using
small-scale models of 8 billion parameters. Our code and data are available at:
https://med-prm.github.io/