ChatPaper.aiChatPaper

PRING:从成对预测到图结构预测的蛋白质-蛋白质相互作用新视角

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

July 7, 2025
作者: Xinzhe Zheng, Hao Du, Fanding Xu, Jinzhe Li, Zhiyuan Liu, Wenkang Wang, Tao Chen, Wanli Ouyang, Stan Z. Li, Yan Lu, Nanqing Dong, Yang Zhang
cs.AI

摘要

基于深度学习的计算方法在预测蛋白质-蛋白质相互作用(PPIs)方面已取得显著成果。然而,现有基准测试主要集中于孤立的成对评估,忽视了模型重建具有生物学意义的PPI网络的能力,而这对于生物学研究至关重要。为填补这一空白,我们推出了PRING,这是首个从图层面全面评估蛋白质-蛋白质相互作用预测的基准测试。PRING精心构建了一个高质量、多物种的PPI网络数据集,包含21,484个蛋白质和186,818个相互作用,并设计了有效策略以应对数据冗余和泄露问题。基于这一黄金标准数据集,我们建立了两种互补的评估范式:(1)面向拓扑的任务,评估种内及跨物种PPI网络的构建;(2)面向功能的任务,包括蛋白质复合物通路预测、GO模块分析及必需蛋白验证。这些评估不仅反映了模型理解网络拓扑的能力,还促进了蛋白质功能注释、生物模块检测乃至疾病机制分析。对基于序列相似性、朴素序列、蛋白质语言模型及结构方法的四大代表性模型类别的广泛实验表明,当前PPI模型在恢复PPI网络的结构与功能属性上存在潜在局限,凸显了支持实际生物应用方面的差距。我们相信,PRING为社区提供了一个可靠的平台,以指导开发更有效的PPI预测模型。PRING的数据集与源代码已公开于https://github.com/SophieSarceau/PRING。
English
Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.
PDF111July 9, 2025