ChatPaper.aiChatPaper

EchoVLM:面向通用超音波智能的動態專家混合視覺-語言模型

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

September 18, 2025
作者: Chaoyin She, Ruifang Lu, Lida Chen, Wei Wang, Qinghua Huang
cs.AI

摘要

超聲成像因其無電離輻射、成本低廉及實時成像等優勢,已成為早期癌症篩查的首選影像模式。然而,傳統超聲診斷高度依賴醫師經驗,存在主觀性強、診斷效率低等挑戰。視覺-語言模型(VLMs)為此提供了潛在解決方案,但現有通用模型在超聲醫學任務中表現出知識有限,多器官病變識別泛化能力差,且在多任務診斷中效率低下。為解決這些限制,我們提出了EchoVLM,這是一款專為超聲醫學成像設計的視覺-語言模型。該模型採用專家混合(MoE)架構,並基於涵蓋七個解剖區域的數據進行訓練。此設計使模型能夠執行多項任務,包括超聲報告生成、診斷及視覺問答(VQA)。實驗結果表明,在超聲報告生成任務中,EchoVLM相比Qwen2-VL在BLEU-1和ROUGE-1分數上分別顯著提升了10.15和4.77分。這些發現表明,EchoVLM在提升超聲成像診斷準確性方面具有巨大潛力,從而為未來臨床應用提供了可行的技術解決方案。源代碼及模型權重可於https://github.com/Asunatan/EchoVLM獲取。
English
Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical tasks, with poor generalization in multi-organ lesion recognition and low efficiency across multi-task diagnostics. To address these limitations, we propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging. The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions. This design enables the model to perform multiple tasks, including ultrasound report generation, diagnosis and visual question-answering (VQA). The experimental results demonstrated that EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively compared to Qwen2-VL on the ultrasound report generation task. These findings suggest that EchoVLM has substantial potential to enhance diagnostic accuracy in ultrasound imaging, thereby providing a viable technical solution for future clinical applications. Source code and model weights are available at https://github.com/Asunatan/EchoVLM.
PDF32September 19, 2025