DrugReasoner: 추론 강화 언어 모델을 통한 해석 가능한 약물 승인 예측

초록

신약 개발은 복잡하고 자원 집약적인 과정으로, 연구 투자 최적화를 위해 초기 승인 결과 예측이 매우 중요합니다. 기존의 기계 학습 및 딥러닝 방법은 신약 승인 예측에서 유망한 성과를 보였지만, 제한된 해석 가능성으로 인해 그 영향력이 제약되었습니다. 본 연구에서는 LLaMA 아키텍처를 기반으로 그룹 상대 정책 최적화(GRPO)를 통해 미세 조정된 추론 기반 대형 언어 모델(LLM)인 DrugReasoner를 제안합니다. DrugReasoner는 분자 기술자를 구조적으로 유사한 승인 및 비승인 화합물과의 비교적 추론과 통합하여 단계별 근거와 신뢰도 점수와 함께 예측을 생성합니다. DrugReasoner는 검증 세트에서 AUC 0.732 및 F1 점수 0.729, 테스트 세트에서 각각 0.725 및 0.718의 견고한 성능을 달성했습니다. 이러한 결과는 로지스틱 회귀, 서포트 벡터 머신, k-최근접 이웃과 같은 기존의 베이스라인을 능가했으며, XGBoost와 비교해도 경쟁력 있는 성능을 보였습니다. 외부 독립 데이터셋에서 DrugReasoner는 베이스라인과 최근 개발된 ChemAP 모델을 모두 능가하며 AUC 0.728 및 F1 점수 0.774를 달성했고, 높은 정밀도와 균형 잡힌 민감도를 유지하며 실제 시나리오에서의 견고성을 입증했습니다. 이러한 결과는 DrugReasoner가 경쟁력 있는 예측 정확도를 제공할 뿐만 아니라 추론 출력을 통해 투명성을 향상시켜 AI 지원 신약 개발의 주요 병목 현상을 해결함을 보여줍니다. 본 연구는 추론이 강화된 LLM이 해석 가능하고 효과적인 제약 의사결정 도구로서의 잠재력을 강조합니다.

English

Drug discovery is a complex and resource-intensive process, making early prediction of approval outcomes critical for optimizing research investments. While classical machine learning and deep learning methods have shown promise in drug approval prediction, their limited interpretability constraints their impact. Here, we present DrugReasoner, a reasoning-based large language model (LLM) built on the LLaMA architecture and fine-tuned with group relative policy optimization (GRPO) to predict the likelihood of small-molecule approval. DrugReasoner integrates molecular descriptors with comparative reasoning against structurally similar approved and unapproved compounds, generating predictions alongside step-by-step rationales and confidence scores. DrugReasoner achieved robust performance with an AUC of 0.732 and an F1 score of 0.729 on the validation set and 0.725 and 0.718 on the test set, respectively. These results outperformed conventional baselines, including logistic regression, support vector machine, and k-nearest neighbors and had competitive performance relative to XGBoost. On an external independent dataset, DrugReasoner outperformed both baseline and the recently developed ChemAP model, achieving an AUC of 0.728 and an F1-score of 0.774, while maintaining high precision and balanced sensitivity, demonstrating robustness in real-world scenarios. These findings demonstrate that DrugReasoner not only delivers competitive predictive accuracy but also enhances transparency through its reasoning outputs, thereby addressing a key bottleneck in AI-assisted drug discovery. This study highlights the potential of reasoning-augmented LLMs as interpretable and effective tools for pharmaceutical decision-making.

DrugReasoner: 추론 강화 언어 모델을 통한 해석 가능한 약물 승인 예측

DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model

초록

Support