ChatPaper.aiChatPaper

奖章S型:面向医学分割的时空文本提示模型

Medal S: Spatio-Textual Prompt Model for Medical Segmentation

November 17, 2025
作者: Pengcheng Shi, Jiawei Chen, Jiaqi Liu, Xinglin Zhang, Tao Chen, Lei Li
cs.AI

摘要

我们推出医学分割基础模型Medal S,该模型在端到端可训练框架内支持原生分辨率空间提示与文本提示。与缺乏空间感知的纯文本方法不同,Medal S实现了三维体积提示与文本嵌入的通道级对齐,有效缓解因分辨率失配导致的精度损失。通过保留完整三维上下文信息,该模型能并行处理多个原生分辨率掩码,显著提升多类别分割性能。轻量化三维卷积模块在双提示类型引导下实现精确体素空间优化,支持BiomedSegFM数据集中CT、MRI、PET、超声及显微镜影像等模态的243个类别分割。Medal S提供两种提示模式:纯文本模式(以模型预测结果作为空间提示进行自主优化)和混合模式(结合人工标注实现灵活交互)。在24类别分割任务中,并行空间提示相较序列式提示将推理时间缩短90%以上。我们提出动态重采样技术解决目标-图像块比例失衡问题,扩展了SAT与nnU-Net的数据增强能力。此外,通过优化文本预处理、两阶段推理策略及后处理技术,显著提升了内存效率、精度与推理速度。在验证集五模态平均指标中,Medal S以DSC 75.44(对比69.83)、NSD 77.34(对比71.06)、F1 38.24(对比24.88)和DSC TP 65.46(对比46.97)全面超越SAT。该模型通过协调空间精度与语义文本指导,在多类别医学分割任务中展现出卓越的效能与准确性。Medal S代码已开源:https://github.com/yinghemedical/Medal-S。
English
We introduce Medal S, a medical segmentation foundation model that supports native-resolution spatial and textual prompts within an end-to-end trainable framework. Unlike text-only methods lacking spatial awareness, Medal S achieves channel-wise alignment between volumetric prompts and text embeddings, mitigating inaccuracies from resolution mismatches. By preserving full 3D context, it efficiently processes multiple native-resolution masks in parallel, enhancing multi-class segmentation performance. A lightweight 3D convolutional module enables precise voxel-space refinement guided by both prompt types, supporting up to 243 classes across CT, MRI, PET, ultrasound, and microscopy modalities in the BiomedSegFM dataset. Medal S offers two prompting modes: a text-only mode, where model predictions serve as spatial prompts for self-refinement without human input, and a hybrid mode, incorporating manual annotations for enhanced flexibility. For 24-class segmentation, parallel spatial prompting reduces inference time by more than 90% compared to sequential prompting. We propose dynamic resampling to address target-patch ratio imbalance, extending SAT and nnU-Net for data augmentation. Furthermore, we develop optimized text preprocessing, a two-stage inference strategy, and post-processing techniques to improve memory efficiency, precision, and inference speed. On the five-modality average on the validation set, Medal S outperforms SAT with a DSC of 75.44 (vs. 69.83), NSD of 77.34 (vs. 71.06), F1 of 38.24 (vs. 24.88), and DSC TP of 65.46 (vs. 46.97). Medal S achieves excellent performance by harmonizing spatial precision with semantic textual guidance, demonstrating superior efficiency and accuracy in multi-class medical segmentation tasks compared to sequential prompt-based approaches. Medal S will be publicly available at https://github.com/yinghemedical/Medal-S.
PDF12December 2, 2025