統一顯微鏡中的任意分割與多模態大型語言模型
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
May 16, 2025
作者: Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan
cs.AI
摘要
精確分割生物醫學影像中的感興趣區域在影像分析中具有重要價值。儘管目前已有若干基礎模型在特定數據集上取得了優異的分割性能,但這些模型在未見過的領域數據上通常表現欠佳。我們將此不足歸因於分割前缺乏視覺-語言知識。多模態大型語言模型(MLLMs)為多模態任務帶來了卓越的理解與推理能力,這啟發我們利用MLLMs注入視覺-語言知識(VLK),從而讓視覺模型在跨域數據集上展現出優越的泛化能力。本文中,我們提出使用MLLMs指導SAM學習顯微鏡跨域數據,統一顯微鏡下的任意分割,命名為uLLSAM。具體而言,我們提出了視覺-語言語義對齊(VLSA)模塊,將VLK注入到任意分割模型(SAM)中。我們發現,SAM在接收全局VLK提示後,其性能顯著提升,但在邊界輪廓感知上存在不足。因此,我們進一步提出語義邊界正則化(SBR)來提示SAM。我們的方法在9個域內顯微鏡數據集上實現了Dice係數7.71%和SA 12.10%的性能提升,達到了最先進的水平。同時,在10個域外數據集上,我們的方法也展示了Dice係數6.79%和SA 10.08%的改進,展現出強大的泛化能力。代碼可在https://github.com/ieellee/uLLSAM獲取。
English
Accurate segmentation of regions of interest in biomedical images holds
substantial value in image analysis. Although several foundation models for
biomedical segmentation have currently achieved excellent performance on
certain datasets, they typically demonstrate sub-optimal performance on unseen
domain data. We owe the deficiency to lack of vision-language knowledge before
segmentation. Multimodal Large Language Models (MLLMs) bring outstanding
understanding and reasoning capabilities to multimodal tasks, which inspires us
to leverage MLLMs to inject Vision-Language Knowledge (VLK), thereby enabling
vision models to demonstrate superior generalization capabilities on
cross-domain datasets. In this paper, we propose using MLLMs to guide SAM in
learning microscopy crose-domain data, unifying Segment Anything in Microscopy,
named uLLSAM. Specifically, we propose the Vision-Language Semantic Alignment
(VLSA) module, which injects VLK into Segment Anything Model (SAM). We find
that after SAM receives global VLK prompts, its performance improves
significantly, but there are deficiencies in boundary contour perception.
Therefore, we further propose Semantic Boundary Regularization (SBR) to prompt
SAM. Our method achieves performance improvements of 7.71% in Dice and 12.10%
in SA across 9 in-domain microscopy datasets, achieving state-of-the-art
performance. Our method also demonstrates improvements of 6.79% in Dice and
10.08% in SA across 10 out-ofdomain datasets, exhibiting strong generalization
capabilities. Code is available at https://github.com/ieellee/uLLSAM.Summary
AI-Generated Summary