ChatPaper.aiChatPaper

PRING:從配對到圖譜的蛋白質-蛋白質相互作用預測新思路

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

July 7, 2025
作者: Xinzhe Zheng, Hao Du, Fanding Xu, Jinzhe Li, Zhiyuan Liu, Wenkang Wang, Tao Chen, Wanli Ouyang, Stan Z. Li, Yan Lu, Nanqing Dong, Yang Zhang
cs.AI

摘要

基於深度學習的計算方法在預測蛋白質-蛋白質相互作用(PPIs)方面已取得顯著成果。然而,現有的基準測試主要集中於孤立的成對評估,忽視了模型重建具有生物學意義的PPI網絡的能力,這對生物學研究至關重要。為填補這一空白,我們推出了PRING,首個從圖層面評估蛋白質-蛋白質相互作用預測的綜合基準。PRING精心策劃了一個高質量、多物種的PPI網絡數據集,包含21,484個蛋白質和186,818個相互作用,並採用精心設計的策略來解決數據冗餘和洩漏問題。基於這一黃金標準數據集,我們建立了兩個互補的評估範式:(1) 面向拓撲的任務,評估物種內及跨物種的PPI網絡構建;(2) 面向功能的任務,包括蛋白質複合體通路預測、GO模塊分析及必需蛋白質驗證。這些評估不僅反映了模型理解網絡拓撲的能力,還促進了蛋白質功能註釋、生物模塊檢測乃至疾病機制分析。對四類代表性模型(基於序列相似性、基於原始序列、基於蛋白質語言模型及基於結構的方法)的廣泛實驗表明,當前PPI模型在恢復PPI網絡的結構和功能特性方面存在潛在侷限,凸顯了支持實際生物學應用的差距。我們相信,PRING為指導社區開發更有效的PPI預測模型提供了可靠平台。PRING的數據集和源代碼可在https://github.com/SophieSarceau/PRING獲取。
English
Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.
PDF111July 9, 2025