通用生物序列重排序提升全新肽段測序效能

摘要

從頭肽段測序是蛋白質組學中的一項關鍵任務。然而，當前基於深度學習的方法的性能受到質譜數據固有複雜性和噪聲信號異質性分佈的限制，導致數據特異性偏差。我們提出了RankNovo，這是第一個深度重排序框架，通過利用多種測序模型的互補優勢來增強從頭肽段測序。RankNovo採用列表式重排序方法，將候選肽段建模為多重序列比對，並利用軸向注意力來提取候選肽段之間的信息特徵。此外，我們引入了兩個新指標，PMD（肽段質量偏差）和RMD（殘基質量偏差），通過在序列和殘基水平上量化肽段之間的質量差異，提供精細的監督。大量實驗表明，RankNovo不僅超越了用於生成訓練候選肽段的基礎模型，還設定了新的最先進基準。此外，RankNovo在未見模型上表現出強大的零樣本泛化能力，這些模型的生成在訓練期間未被暴露，突顯了其作為肽段測序通用重排序框架的魯棒性和潛力。我們的工作提出了一種新穎的重排序策略，從根本上挑戰了現有的單一模型範式，並推動了精確從頭測序的前沿。我們的源代碼已在GitHub上提供。

English

De novo peptide sequencing is a critical task in proteomics. However, the performance of current deep learning-based methods is limited by the inherent complexity of mass spectrometry data and the heterogeneous distribution of noise signals, leading to data-specific biases. We present RankNovo, the first deep reranking framework that enhances de novo peptide sequencing by leveraging the complementary strengths of multiple sequencing models. RankNovo employs a list-wise reranking approach, modeling candidate peptides as multiple sequence alignments and utilizing axial attention to extract informative features across candidates. Additionally, we introduce two new metrics, PMD (Peptide Mass Deviation) and RMD (residual Mass Deviation), which offer delicate supervision by quantifying mass differences between peptides at both the sequence and residue levels. Extensive experiments demonstrate that RankNovo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, RankNovo exhibits strong zero-shot generalization to unseen models whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. Our work presents a novel reranking strategy that fundamentally challenges existing single-model paradigms and advances the frontier of accurate de novo sequencing. Our source code is provided on GitHub.

通用生物序列重排序提升全新肽段測序效能

Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

摘要

Support