개선된 De Novo 펩타이드 시퀀싱을 위한 범용 생물학적 시퀀스 재순위화

초록

디노보 펩타이드 시퀀싱은 프로테오믹스에서 중요한 과제입니다. 그러나 현재의 딥러닝 기반 방법들은 질량 분석 데이터의 고유한 복잡성과 노이즈 신호의 이질적 분포로 인해 데이터 특이적 편향을 보이며, 이는 성능을 제한합니다. 우리는 RankNovo를 제안합니다. RankNovo는 여러 시퀀싱 모델의 상호 보완적 강점을 활용하여 디노보 펩타이드 시퀀싱을 향상시키는 최초의 딥 리랭킹 프레임워크입니다. RankNovo는 리스트 방식의 리랭킹 접근법을 사용하며, 후보 펩타이드를 다중 시퀀스 정렬로 모델링하고 축 주의 메커니즘을 통해 후보들 간의 정보성 있는 특징을 추출합니다. 또한, 우리는 PMD(펩타이드 질량 편차)와 RMD(잔여 질량 편차)라는 두 가지 새로운 메트릭을 도입하여, 시퀀스와 잔여 수준에서 펩타이드 간의 질량 차이를 정량화함으로써 세밀한 지도를 제공합니다. 광범위한 실험을 통해 RankNovo는 리랭킹 사전 학습을 위해 사용된 기본 모델들을 능가할 뿐만 아니라, 새로운 최첨단 벤치마크를 설정함을 입증했습니다. 더욱이, RankNovo는 훈련 중에 노출되지 않은 모델들의 생성에 대해 강력한 제로샷 일반화 능력을 보여주며, 이는 펩타이드 시퀀싱을 위한 보편적 리랭킹 프레임워크로서의 견고성과 잠재력을 강조합니다. 우리의 연구는 기존의 단일 모델 패러다임에 근본적으로 도전하는 새로운 리랭킹 전략을 제시하며, 정확한 디노보 시퀀싱의 최전선을 발전시킵니다. 우리의 소스 코드는 GitHub에서 제공됩니다.

English

De novo peptide sequencing is a critical task in proteomics. However, the performance of current deep learning-based methods is limited by the inherent complexity of mass spectrometry data and the heterogeneous distribution of noise signals, leading to data-specific biases. We present RankNovo, the first deep reranking framework that enhances de novo peptide sequencing by leveraging the complementary strengths of multiple sequencing models. RankNovo employs a list-wise reranking approach, modeling candidate peptides as multiple sequence alignments and utilizing axial attention to extract informative features across candidates. Additionally, we introduce two new metrics, PMD (Peptide Mass Deviation) and RMD (residual Mass Deviation), which offer delicate supervision by quantifying mass differences between peptides at both the sequence and residue levels. Extensive experiments demonstrate that RankNovo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, RankNovo exhibits strong zero-shot generalization to unseen models whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. Our work presents a novel reranking strategy that fundamentally challenges existing single-model paradigms and advances the frontier of accurate de novo sequencing. Our source code is provided on GitHub.

개선된 De Novo 펩타이드 시퀀싱을 위한 범용 생물학적 시퀀스 재순위화

Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

초록

Support