SPIDER:一個全面的多器官監督式病理學數據集與基準模型
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
March 4, 2025
作者: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
cs.AI
摘要
在計算病理學中推進人工智慧技術,需要大量高品質且多樣化的數據集,然而現有的公開數據集往往在器官多樣性、類別覆蓋範圍或註釋質量上存在限制。為彌補這一差距,我們推出了SPIDER(監督式病理圖像描述庫),這是目前最大的公開可用的切片級數據集,涵蓋了包括皮膚、結直腸和胸腔在內的多種器官類型,並為每個器官提供了全面的類別覆蓋。SPIDER提供了由病理學專家驗證的高質量註釋,並包含周圍環境切片,這些切片通過提供空間上下文來增強分類性能。
除了數據集,我們還展示了基於SPIDER訓練的基準模型,這些模型使用Hibou-L基礎模型作為特徵提取器,並結合了基於注意力的分類頭。這些模型在多個組織類別上達到了最先進的性能,為未來的數字病理學研究提供了強有力的基準。除了切片分類,該模型還能快速識別重要區域、量化組織指標,並為多模態方法奠定基礎。
數據集和訓練好的模型均已公開,以促進研究、可重複性及AI驅動的病理學發展。訪問地址:https://github.com/HistAI/SPIDER
English
Advancing AI in computational pathology requires large, high-quality, and
diverse datasets, yet existing public datasets are often limited in organ
diversity, class coverage, or annotation quality. To bridge this gap, we
introduce SPIDER (Supervised Pathology Image-DEscription Repository), the
largest publicly available patch-level dataset covering multiple organ types,
including Skin, Colorectal, and Thorax, with comprehensive class coverage for
each organ. SPIDER provides high-quality annotations verified by expert
pathologists and includes surrounding context patches, which enhance
classification performance by providing spatial context.
Alongside the dataset, we present baseline models trained on SPIDER using the
Hibou-L foundation model as a feature extractor combined with an
attention-based classification head. The models achieve state-of-the-art
performance across multiple tissue categories and serve as strong benchmarks
for future digital pathology research. Beyond patch classification, the model
enables rapid identification of significant areas, quantitative tissue metrics,
and establishes a foundation for multimodal approaches.
Both the dataset and trained models are publicly available to advance
research, reproducibility, and AI-driven pathology development. Access them at:
https://github.com/HistAI/SPIDERSummary
AI-Generated Summary