利用微調的小型語言模型準確預測配體-蛋白質相互作用親和力

摘要

我們描述了使用微調預訓練生成式小語言模型（SLM）準確預測配體-蛋白質相互作用（LPI）親和力，也被稱為藥物-靶標相互作用（DTI）。我們在零樣本設置中對與配體-蛋白質相互作用相關的一系列親和力值在測試集數據上實現了準確預測。模型的輸入僅使用了配體的SMILES字符串和蛋白質的氨基酸序列。我們的結果顯示，在準確預測一系列配體-蛋白質相互作用親和力方面，相較於基於機器學習（ML）和自由能變化（FEP+）的方法，有明顯的改善，這可以用來進一步加速針對具有挑戰性治療靶點的藥物發現活動。

English

We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.

利用微調的小型語言模型準確預測配體-蛋白質相互作用親和力

Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models

摘要

Support