医学影像中的不确定性感知视觉语言分割

摘要

我们提出了一种新颖的不确定性感知多模态分割框架，该框架同时利用放射影像与相关临床文本实现精准医疗诊断。我们设计了配备轻量级状态空间混合器（SSMix）的模态解码注意力块（MoDAB），以实现高效的跨模态融合和长程依赖建模。为在模糊场景下指导学习，我们提出了谱熵不确定性（SEU）损失函数，该函数将空间重叠度、频谱一致性和预测不确定性共同纳入统一目标。在图像质量较差的复杂临床场景中，这一设计显著提升了模型的可靠性。在多个公开医学数据集（QATA-COVID19、MosMed++和Kvasir-SEG）上的大量实验表明，我们的方法在实现卓越分割性能的同时，计算效率显著优于现有前沿（SoTA）方法。实验结果凸显了不确定性建模与结构化模态对齐在视觉-语言医疗分割任务中的重要性。代码地址：https://github.com/arya-domain/UA-VLS

English

We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS

医学影像中的不确定性感知视觉语言分割

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

摘要

Support