通用生物序列重排序提升全新肽段測序效能
Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing
May 23, 2025
作者: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun
cs.AI
摘要
從頭肽段測序是蛋白質組學中的一項關鍵任務。然而,當前基於深度學習的方法的性能受到質譜數據固有複雜性和噪聲信號異質性分佈的限制,導致數據特異性偏差。我們提出了RankNovo,這是第一個深度重排序框架,通過利用多種測序模型的互補優勢來增強從頭肽段測序。RankNovo採用列表式重排序方法,將候選肽段建模為多重序列比對,並利用軸向注意力來提取候選肽段之間的信息特徵。此外,我們引入了兩個新指標,PMD(肽段質量偏差)和RMD(殘基質量偏差),通過在序列和殘基水平上量化肽段之間的質量差異,提供精細的監督。大量實驗表明,RankNovo不僅超越了用於生成訓練候選肽段的基礎模型,還設定了新的最先進基準。此外,RankNovo在未見模型上表現出強大的零樣本泛化能力,這些模型的生成在訓練期間未被暴露,突顯了其作為肽段測序通用重排序框架的魯棒性和潛力。我們的工作提出了一種新穎的重排序策略,從根本上挑戰了現有的單一模型範式,並推動了精確從頭測序的前沿。我們的源代碼已在GitHub上提供。
English
De novo peptide sequencing is a critical task in proteomics. However, the
performance of current deep learning-based methods is limited by the inherent
complexity of mass spectrometry data and the heterogeneous distribution of
noise signals, leading to data-specific biases. We present RankNovo, the first
deep reranking framework that enhances de novo peptide sequencing by leveraging
the complementary strengths of multiple sequencing models. RankNovo employs a
list-wise reranking approach, modeling candidate peptides as multiple sequence
alignments and utilizing axial attention to extract informative features across
candidates. Additionally, we introduce two new metrics, PMD (Peptide Mass
Deviation) and RMD (residual Mass Deviation), which offer delicate supervision
by quantifying mass differences between peptides at both the sequence and
residue levels. Extensive experiments demonstrate that RankNovo not only
surpasses its base models used to generate training candidates for reranking
pre-training, but also sets a new state-of-the-art benchmark. Moreover,
RankNovo exhibits strong zero-shot generalization to unseen models whose
generations were not exposed during training, highlighting its robustness and
potential as a universal reranking framework for peptide sequencing. Our work
presents a novel reranking strategy that fundamentally challenges existing
single-model paradigms and advances the frontier of accurate de novo
sequencing. Our source code is provided on GitHub.Summary
AI-Generated Summary