DrugReasoner: 推論拡張型言語モデルによる解釈可能な医薬品承認予測

要旨

創薬は複雑でリソース集約的なプロセスであり、研究投資を最適化するためには早期の承認結果予測が重要です。古典的な機械学習や深層学習手法は創薬承認予測において有望な成果を示していますが、解釈可能性の低さがその影響力を制限しています。本論文では、LLaMAアーキテクチャを基盤とし、グループ相対ポリシー最適化（GRPO）を用いてファインチューニングされた推論ベースの大規模言語モデル（LLM）であるDrugReasonerを提案します。DrugReasonerは分子記述子を、構造的に類似した承認済みおよび未承認化合物との比較推論と統合し、ステップバイステップの根拠と信頼度スコアを伴う予測を生成します。DrugReasonerは、検証セットでAUC 0.732、F1スコア 0.729、テストセットでそれぞれ0.725と0.718という堅牢な性能を達成しました。これらの結果は、ロジスティック回帰、サポートベクターマシン、k近傍法などの従来のベースラインを上回り、XGBoostと比較しても競争力のある性能を示しました。外部の独立データセットにおいて、DrugReasonerはベースラインと最近開発されたChemAPモデルの両方を上回り、AUC 0.728、F1スコア 0.774を達成し、高い精度とバランスの取れた感度を維持することで、実世界のシナリオにおける堅牢性を実証しました。これらの知見は、DrugReasonerが競争力のある予測精度を提供するだけでなく、推論出力を通じて透明性を高めることで、AI支援創薬における重要なボトルネックに対処することを示しています。本研究は、推論を強化したLLMが、製薬意思決定のための解釈可能で効果的なツールとしての可能性を強調しています。

English

Drug discovery is a complex and resource-intensive process, making early prediction of approval outcomes critical for optimizing research investments. While classical machine learning and deep learning methods have shown promise in drug approval prediction, their limited interpretability constraints their impact. Here, we present DrugReasoner, a reasoning-based large language model (LLM) built on the LLaMA architecture and fine-tuned with group relative policy optimization (GRPO) to predict the likelihood of small-molecule approval. DrugReasoner integrates molecular descriptors with comparative reasoning against structurally similar approved and unapproved compounds, generating predictions alongside step-by-step rationales and confidence scores. DrugReasoner achieved robust performance with an AUC of 0.732 and an F1 score of 0.729 on the validation set and 0.725 and 0.718 on the test set, respectively. These results outperformed conventional baselines, including logistic regression, support vector machine, and k-nearest neighbors and had competitive performance relative to XGBoost. On an external independent dataset, DrugReasoner outperformed both baseline and the recently developed ChemAP model, achieving an AUC of 0.728 and an F1-score of 0.774, while maintaining high precision and balanced sensitivity, demonstrating robustness in real-world scenarios. These findings demonstrate that DrugReasoner not only delivers competitive predictive accuracy but also enhances transparency through its reasoning outputs, thereby addressing a key bottleneck in AI-assisted drug discovery. This study highlights the potential of reasoning-augmented LLMs as interpretable and effective tools for pharmaceutical decision-making.

DrugReasoner: 推論拡張型言語モデルによる解釈可能な医薬品承認予測

DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model

要旨

Support