不確定性感知的醫療影像視覺-語言分割技術
Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
February 16, 2026
作者: Aryan Das, Tanishq Rachamalla, Koushik Biswas, Swalpa Kumar Roy, Vinay Kumar Verma
cs.AI
摘要
我們提出了一種新穎的不確定性感知多模態分割框架,該框架同時利用放射影像與相關臨床文本實現精準醫療診斷。我們設計了具備輕量級狀態空間混合器(SSMix)的模態解碼注意力模塊(MoDAB),以實現高效的跨模態融合與長程依賴建模。為在模糊情況下引導學習,我們提出頻譜熵不確定性(SEU)損失函數,通過統一目標聯合捕捉空間重疊度、頻譜一致性與預測不確定性。在影像品質較差的複雜臨床情境中,此設計能有效提升模型可靠性。在QATA-COVID19、MosMed++和Kvasir-SEG等多個公開醫學數據集上的大量實驗表明,我們的方法在實現卓越分割性能的同時,計算效率顯著優於現有頂尖(SoTA)方案。實驗結果凸顯了在不確定性建模與結構化模態對齊在視覺-語言醫療分割任務中的重要性。代碼地址:https://github.com/arya-domain/UA-VLS
English
We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS