UniBiomed:面向生物医学图像解析的通用基础模型
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
April 30, 2025
作者: Linshan Wu, Yuxiang Nie, Sunan He, Jiaxin Zhuang, Hao Chen
cs.AI
摘要
生物医学图像的多模态解读为生物医学图像分析开辟了新的机遇。传统的AI方法通常依赖于分离式训练,即使用大型语言模型(LLMs)生成临床文本,以及使用分割模型进行目标提取,这导致了实际应用中的不灵活性和无法充分利用整体生物医学信息。为此,我们推出了UniBiomed,首个用于生物医学图像解读的通用基础模型。UniBiomed基于多模态大型语言模型(MLLM)与“分割一切模型”(SAM)的创新整合,有效统一了临床文本生成与相应生物医学对象的分割,实现了有依据的解读。通过这种方式,UniBiomed能够处理跨越十种不同生物医学成像模式的广泛任务。为开发UniBiomed,我们构建了一个大规模数据集,包含超过2700万组图像、注释及文本描述,覆盖十种成像模式。在84个内部和外部数据集上的广泛验证表明,UniBiomed在分割、疾病识别、区域感知诊断、视觉问答及报告生成等方面均达到了最先进的性能。此外,与以往依赖临床专家预先诊断图像并手动制作精确文本或视觉提示的模型不同,UniBiomed能够为生物医学图像分析提供自动化、端到端的有依据解读。这标志着临床工作流程的一次新颖范式转变,将显著提升诊断效率。总之,UniBiomed代表了生物医学AI领域的一项新突破,解锁了强大的有依据解读能力,为更准确、高效的生物医学图像分析铺平了道路。
English
Multi-modal interpretation of biomedical images opens up novel opportunities
in biomedical image analysis. Conventional AI approaches typically rely on
disjointed training, i.e., Large Language Models (LLMs) for clinical text
generation and segmentation models for target extraction, which results in
inflexible real-world deployment and a failure to leverage holistic biomedical
information. To this end, we introduce UniBiomed, the first universal
foundation model for grounded biomedical image interpretation. UniBiomed is
based on a novel integration of Multi-modal Large Language Model (MLLM) and
Segment Anything Model (SAM), which effectively unifies the generation of
clinical texts and the segmentation of corresponding biomedical objects for
grounded interpretation. In this way, UniBiomed is capable of tackling a wide
range of biomedical tasks across ten diverse biomedical imaging modalities. To
develop UniBiomed, we curate a large-scale dataset comprising over 27 million
triplets of images, annotations, and text descriptions across ten imaging
modalities. Extensive validation on 84 internal and external datasets
demonstrated that UniBiomed achieves state-of-the-art performance in
segmentation, disease recognition, region-aware diagnosis, visual question
answering, and report generation. Moreover, unlike previous models that rely on
clinical experts to pre-diagnose images and manually craft precise textual or
visual prompts, UniBiomed can provide automated and end-to-end grounded
interpretation for biomedical image analysis. This represents a novel paradigm
shift in clinical workflows, which will significantly improve diagnostic
efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical
AI, unlocking powerful grounded interpretation capabilities for more accurate
and efficient biomedical image analysis.