ProteinBench：蛋白質基礎模型的全面評估

摘要

近年來，蛋白質基礎模型的發展急遽增加，顯著提升了蛋白質預測和生成任務的表現，從3D結構預測和蛋白設計到構象動力學。然而，由於缺乏統一的評估框架，這些模型的能力和限制仍然知之甚少。為了填補這一空白，我們引入了ProteinBench，這是一個全面的評估框架，旨在提高蛋白質基礎模型的透明度。我們的方法包括三個關鍵組件：(i)對任務進行分類，廣泛涵蓋蛋白質領域的主要挑戰，基於不同蛋白質模態之間的關係；(ii)多指標評估方法，評估四個關鍵維度上的表現：質量、新穎性、多樣性和穩健性；以及(iii)從各種用戶目標進行深入分析，提供對模型表現的全面視角。我們對蛋白質基礎模型進行了全面評估，揭示了幾個關鍵發現，闡明了它們目前的能力和限制。為了促進透明度並促進進一步研究，我們公開發布了評估數據集、代碼和公開排行榜，以進行進一步分析和提供一個通用的模塊化工具包。我們希望ProteinBench成為一個活躍的基準，為建立標準化、深入評估蛋白質基礎模型的框架，推動其發展和應用，同時促進領域內的合作。

English

Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field.