ChatPaper.aiChatPaper

显微镜图像分割与多模态大语言模型的统一框架

Unifying Segment Anything in Microscopy with Multimodal Large Language Model

May 16, 2025
作者: Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan
cs.AI

摘要

在生物医学图像中精确分割感兴趣区域对图像分析具有重要价值。尽管目前已有多种生物医学分割基础模型在特定数据集上表现出色,但它们通常在未见过的领域数据上表现欠佳。我们将这一不足归因于分割前缺乏视觉-语言知识。多模态大语言模型(MLLMs)为多模态任务带来了卓越的理解与推理能力,这启发我们利用MLLMs注入视觉-语言知识(VLK),从而使视觉模型在跨领域数据集上展现出更优的泛化能力。本文提出使用MLLMs指导SAM学习显微跨领域数据,统一命名为uLLSAM的显微图像任意分割方法。具体而言,我们提出了视觉-语言语义对齐(VLSA)模块,将VLK注入到任意分割模型(SAM)中。我们发现,SAM在接收全局VLK提示后,其性能显著提升,但在边界轮廓感知上存在不足。因此,我们进一步提出了语义边界正则化(SBR)来提示SAM。我们的方法在9个领域内显微数据集上实现了Dice系数7.71%和SA 12.10%的性能提升,达到了最先进的水平。同时,在10个领域外数据集上,我们的方法也展示了Dice系数6.79%和SA 10.08%的改进,展现了强大的泛化能力。代码可在https://github.com/ieellee/uLLSAM获取。
English
Accurate segmentation of regions of interest in biomedical images holds substantial value in image analysis. Although several foundation models for biomedical segmentation have currently achieved excellent performance on certain datasets, they typically demonstrate sub-optimal performance on unseen domain data. We owe the deficiency to lack of vision-language knowledge before segmentation. Multimodal Large Language Models (MLLMs) bring outstanding understanding and reasoning capabilities to multimodal tasks, which inspires us to leverage MLLMs to inject Vision-Language Knowledge (VLK), thereby enabling vision models to demonstrate superior generalization capabilities on cross-domain datasets. In this paper, we propose using MLLMs to guide SAM in learning microscopy crose-domain data, unifying Segment Anything in Microscopy, named uLLSAM. Specifically, we propose the Vision-Language Semantic Alignment (VLSA) module, which injects VLK into Segment Anything Model (SAM). We find that after SAM receives global VLK prompts, its performance improves significantly, but there are deficiencies in boundary contour perception. Therefore, we further propose Semantic Boundary Regularization (SBR) to prompt SAM. Our method achieves performance improvements of 7.71% in Dice and 12.10% in SA across 9 in-domain microscopy datasets, achieving state-of-the-art performance. Our method also demonstrates improvements of 6.79% in Dice and 10.08% in SA across 10 out-ofdomain datasets, exhibiting strong generalization capabilities. Code is available at https://github.com/ieellee/uLLSAM.

Summary

AI-Generated Summary

PDF32May 19, 2025