医学影像中的不确定性感知视觉语言分割
Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
February 16, 2026
作者: Aryan Das, Tanishq Rachamalla, Koushik Biswas, Swalpa Kumar Roy, Vinay Kumar Verma
cs.AI
摘要
我们提出了一种新颖的不确定性感知多模态分割框架,该框架同时利用放射影像与相关临床文本实现精准医疗诊断。我们设计了配备轻量级状态空间混合器(SSMix)的模态解码注意力块(MoDAB),以实现高效的跨模态融合和长程依赖建模。为在模糊场景下指导学习,我们提出了谱熵不确定性(SEU)损失函数,该函数将空间重叠度、频谱一致性和预测不确定性共同纳入统一目标。在图像质量较差的复杂临床场景中,这一设计显著提升了模型的可靠性。在多个公开医学数据集(QATA-COVID19、MosMed++和Kvasir-SEG)上的大量实验表明,我们的方法在实现卓越分割性能的同时,计算效率显著优于现有前沿(SoTA)方法。实验结果凸显了不确定性建模与结构化模态对齐在视觉-语言医疗分割任务中的重要性。代码地址:https://github.com/arya-domain/UA-VLS
English
We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS