AtlasPatch:计算病理学中全切片图像预处理的高效可扩展工具
AtlasPatch: An Efficient and Scalable Tool for Whole Slide Image Preprocessing in Computational Pathology
February 3, 2026
作者: Ahmed Alagha, Christopher Leclerc, Yousef Kotp, Omar Metwally, Calvin Moras, Peter Rentopoulos, Ghodsiyeh Rostami, Bich Ngoc Nguyen, Jumanah Baig, Abdelhakim Khellaf, Vincent Quoc-Huy Trinh, Rabeb Mizouni, Hadi Otrok, Jamal Bentahar, Mahdi S. Hosseini
cs.AI
摘要
全切片图像预处理作为AI驱动计算病理学工作流程的基础环节,通常包含组织检测与组织块提取两个步骤。由于现有工具或依赖不精确的启发式阈值法进行组织检测,或采用基于有限多样性数据训练的补丁级AI方法导致计算复杂度激增,该环节仍是主要计算瓶颈。我们提出AtlasPatch——一种高效可扩展的切片预处理框架,能以最小计算开销实现精准组织检测与高通量组织块提取。该框架的组织检测模块通过对约3万张异质性半人工标注的WSI缩略图数据集进行Segment-Anything模型的高效微调训练,能够将组织掩模从缩略图外推至全分辨率切片,在用户指定放大倍数下提取组织块坐标,并支持将组织块直接流式传输至通用图像编码器生成嵌入向量或存储为图像文件,所有操作均实现在CPU与GPU上的高效并行处理。我们在分割精度、计算复杂度及下游多示例学习任务中对AtlasPatch进行全面评估,结果表明其在仅需极低计算成本的前提下即可达到业界领先性能。本工具已开源发布于https://github.com/AtlasAnalyticsLab/AtlasPatch。
English
Whole-slide image (WSI) preprocessing, typically comprising tissue detection followed by patch extraction, is foundational to AI-driven computational pathology workflows. This remains a major computational bottleneck as existing tools either rely on inaccurate heuristic thresholding for tissue detection, or adopt AI-based approaches trained on limited-diversity data that operate at the patch level, incurring substantial computational complexity. We present AtlasPatch, an efficient and scalable slide preprocessing framework for accurate tissue detection and high-throughput patch extraction with minimal computational overhead. AtlasPatch's tissue detection module is trained on a heterogeneous and semi-manually annotated dataset of ~30,000 WSI thumbnails, using efficient fine-tuning of the Segment-Anything model. The tool extrapolates tissue masks from thumbnails to full-resolution slides to extract patch coordinates at user-specified magnifications, with options to stream patches directly into common image encoders for embedding or store patch images, all efficiently parallelized across CPUs and GPUs. We assess AtlasPatch across segmentation precision, computational complexity, and downstream multiple-instance learning, matching state-of-the-art performance while operating at a fraction of their computational cost. AtlasPatch is open-source and available at https://github.com/AtlasAnalyticsLab/AtlasPatch.