ChatPaper.aiChatPaper

EchoVLM:面向通用超声智能的动态专家混合视觉-语言模型

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

September 18, 2025
作者: Chaoyin She, Ruifang Lu, Lida Chen, Wei Wang, Qinghua Huang
cs.AI

摘要

超声成像凭借其无电离辐射、成本低廉及实时成像的优势,已成为早期癌症筛查的首选影像学手段。然而,传统超声诊断高度依赖医师经验,存在主观性强、诊断效率低的问题。视觉-语言模型(VLMs)为此提供了潜在的解决方案,但现有通用模型在超声医学任务中知识储备有限,多器官病变识别泛化能力差,且跨任务诊断效率低下。针对这些局限,我们提出了EchoVLM,一款专为超声医学影像设计的视觉-语言模型。该模型采用专家混合(MoE)架构,基于涵盖七个解剖区域的数据进行训练,使其能够执行包括超声报告生成、诊断及视觉问答(VQA)在内的多项任务。实验结果显示,在超声报告生成任务中,EchoVLM相较于Qwen2-VL,BLEU-1和ROUGE-1分数分别显著提升了10.15和4.77分。这些发现表明,EchoVLM在提升超声成像诊断准确性方面具有巨大潜力,为未来临床应用提供了可行的技术方案。源代码及模型权重已发布于https://github.com/Asunatan/EchoVLM。
English
Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical tasks, with poor generalization in multi-organ lesion recognition and low efficiency across multi-task diagnostics. To address these limitations, we propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging. The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions. This design enables the model to perform multiple tasks, including ultrasound report generation, diagnosis and visual question-answering (VQA). The experimental results demonstrated that EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively compared to Qwen2-VL on the ultrasound report generation task. These findings suggest that EchoVLM has substantial potential to enhance diagnostic accuracy in ultrasound imaging, thereby providing a viable technical solution for future clinical applications. Source code and model weights are available at https://github.com/Asunatan/EchoVLM.
PDF32September 19, 2025