의료 영상 분석을 위한 불확실성 인식 시각-언어 분할

초록

우리는 정확한 의학적 진단을 위해 방사선 영상과 관련 임상 텍스트를 모두 활용하는 새로운 불확실성 인식 다중모달 분할 프레임워크를 소개한다. 우리는 효율적인 교차 모달 융합 및 장거리 종속성 모델링을 가능하게 하는 경량 상태 공간 혼합기(SSMix)를 탑재한 모달리티 디코딩 주의 블록(MoDAB)을 제안한다. 모호성 하에서 학습을 안내하기 위해 공간적 중첩, 스펙트럼 일관성 및 예측 불확실성을 통합 목적 함수로 함께 포착하는 스펙트럼-엔트로피 불확실성(SEU) 손실을 제안한다. 영상 품질이 낮은 복잡한 임상 환경에서 이 구성은 모델 신뢰성을 향상시킨다. 다양한 공개 의료 데이터셋(QATA-COVID19, MosMed++, Kvasir-SEG)에 대한 광범위한 실험을 통해 우리 방법이 기존 최첨단(SoTA) 접근법보다 계산 효율성이 현저히 높으면서도 우수한 분할 성능을 달성함을 입증한다. 우리의 결과는 시각-언어 의료 분할 작업에 불확실성 모델링과 구조화된 모달리티 정렬을 통합하는 것의 중요성을 강조한다. 코드: https://github.com/arya-domain/UA-VLS

English

We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS

의료 영상 분석을 위한 불확실성 인식 시각-언어 분할

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

초록

Support