Med-PRM：段階的かつガイドライン検証済みプロセス報酬を備えた医療推論モデル

要旨

大規模言語モデルは臨床意思決定において有望な成果を示しているが、現在のアプローチでは推論プロセスの特定のステップにおけるエラーの特定と修正に課題を抱えている。この制約は医学分野において特に重要であり、正確な診断と効果的な患者ケアのためには推論エラーの特定と対処が不可欠である。本研究では、Med-PRMというプロセス報酬モデリングフレームワークを提案する。このフレームワークは、検索拡張生成を活用して、確立された医療知識ベースに対して各推論ステップを検証する。臨床ガイドラインや文献から検索されたエビデンスを用いて中間推論ステップを検証することで、我々のモデルは推論の質をきめ細かく評価することができる。5つの医療QAベンチマークと2つのオープンエンド診断タスクでの評価により、Med-PRMは最先端の性能を達成し、ベースモデルの性能を最大13.50%向上させることが示された。さらに、Meerkatのような強力なポリシーモデルにプラグアンドプレイ方式で統合することでMed-PRMの汎用性を実証し、80億パラメータの小規模モデルを用いてMedQAで初めて80%以上の精度を達成した。我々のコードとデータはhttps://med-prm.github.io/で公開されている。

English

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80\% accuracy on MedQA for the first time using small-scale models of 8 billion parameters. Our code and data are available at: https://med-prm.github.io/

Med-PRM：段階的かつガイドライン検証済みプロセス報酬を備えた医療推論モデル

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

要旨

Support