PRING: ペアからグラフへ向けたタンパク質間相互作用予測の再考

要旨

深層学習に基づく計算手法は、タンパク質間相互作用（PPI）の予測において有望な結果を達成してきた。しかし、既存のベンチマークは主に孤立したペアワイズ評価に焦点を当てており、生物学的研究において重要な、生物学的に意味のあるPPIネットワークを再構築するモデルの能力を見落としている。このギャップを埋めるため、我々はグラフレベルの視点からタンパク質間相互作用予測を評価する初の包括的なベンチマークであるPRINGを導入する。PRINGは、21,484のタンパク質と186,818の相互作用からなる高品質な多種PPIネットワークデータセットをキュレーションし、データの冗長性とリークに対処するための設計された戦略を提供する。このゴールドスタンダードデータセットに基づいて、我々は二つの補完的な評価パラダイムを確立する：（1）トポロジー指向タスク、これは種内および種間PPIネットワーク構築を評価し、（2）機能指向タスク、これにはタンパク質複合体経路予測、GOモジュール分析、および必須タンパク質の正当化が含まれる。これらの評価は、モデルがネットワークトポロジーを理解する能力を反映するだけでなく、タンパク質機能注釈、生物学的モジュール検出、さらには疾患メカニズム分析を促進する。配列類似性ベース、ナイーブ配列ベース、タンパク質言語モデルベース、および構造ベースのアプローチからなる四つの代表的なモデルカテゴリに対する広範な実験は、現在のPPIモデルがPPIネットワークの構造的および機能的な特性を回復する上で潜在的な限界があることを示し、実世界の生物学的アプリケーションをサポートする上でのギャップを浮き彫りにする。我々は、PRINGがコミュニティにとってより効果的なPPI予測モデルの開発を導く信頼できるプラットフォームを提供すると信じている。PRINGのデータセットとソースコードはhttps://github.com/SophieSarceau/PRINGで利用可能である。

English

Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.

PRING: ペアからグラフへ向けたタンパク質間相互作用予測の再考

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

要旨

Support