利用经过微调的小型语言模型准确预测配体-蛋白质相互作用亲和力

摘要

我们描述了使用微调预训练生成式小语言模型（SLMs）准确预测配体-蛋白相互作用（LPI）亲和力的方法，也被称为药物-靶标相互作用（DTI）。我们在零样本设置中针对与配体-蛋白相互作用相关的一系列亲和力数值实现了准确预测。模型的输入仅为配体的SMILES字符串和蛋白的氨基酸序列。我们的结果表明，在准确预测一系列配体-蛋白相互作用亲和力方面，与基于机器学习（ML）和自由能扰动（FEP+）的方法相比，实现了明显的改进，这可以进一步加速针对具有挑战性治疗靶点的药物发现活动。

English

We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.

利用经过微调的小型语言模型准确预测配体-蛋白质相互作用亲和力

Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models

摘要

Support