Med-PRM: 단계별 지침 검증 프로세스 보상을 통한 의료 추론 모델

초록

대형 언어 모델은 임상 의사결정에서 유망한 가능성을 보여주고 있지만, 현재의 접근 방식은 추론 과정의 특정 단계에서 오류를 찾아내고 수정하는 데 어려움을 겪고 있습니다. 이러한 한계는 정확한 진단과 효과적인 환자 치료를 위해 추론 오류를 식별하고 해결하는 것이 필수적인 의학 분야에서 매우 중요합니다. 우리는 Med-PRM이라는 프로세스 보상 모델링 프레임워크를 소개합니다. 이 프레임워크는 검색 강화 생성(retrieval-augmented generation)을 활용하여 각 추론 단계를 확립된 의학 지식 기반과 대조하여 검증합니다. 임상 가이드라인과 문헌에서 검색된 증거를 통해 중간 추론 단계를 검증함으로써, 우리의 모델은 세밀한 방식으로 추론 품질을 정확하게 평가할 수 있습니다. 다섯 가지 의학 QA 벤치마크와 두 가지 개방형 진단 과제에 대한 평가 결과, Med-PRM은 최첨단 성능을 달성하며, 기본 모델의 성능을 최대 13.50%까지 향상시켰습니다. 또한, 우리는 Meerkat과 같은 강력한 정책 모델에 플러그 앤 플레이 방식으로 Med-PRM을 통합함으로써 그 일반성을 입증했습니다. 이를 통해 80억 개의 파라미터를 가진 소규모 모델을 사용하여 MedQA에서 처음으로 80% 이상의 정확도를 달성했습니다. 우리의 코드와 데이터는 https://med-prm.github.io/에서 확인할 수 있습니다.

English

Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80\% accuracy on MedQA for the first time using small-scale models of 8 billion parameters. Our code and data are available at: https://med-prm.github.io/

Med-PRM: 단계별 지침 검증 프로세스 보상을 통한 의료 추론 모델

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

초록

Support