UniBiomed：面向生物医学图像解析的通用基础模型

摘要

生物医学图像的多模态解读为生物医学图像分析开辟了新的机遇。传统的AI方法通常依赖于分离式训练，即使用大型语言模型（LLMs）生成临床文本，以及使用分割模型进行目标提取，这导致了实际应用中的不灵活性和无法充分利用整体生物医学信息。为此，我们推出了UniBiomed，首个用于生物医学图像解读的通用基础模型。UniBiomed基于多模态大型语言模型（MLLM）与“分割一切模型”（SAM）的创新整合，有效统一了临床文本生成与相应生物医学对象的分割，实现了有依据的解读。通过这种方式，UniBiomed能够处理跨越十种不同生物医学成像模式的广泛任务。为开发UniBiomed，我们构建了一个大规模数据集，包含超过2700万组图像、注释及文本描述，覆盖十种成像模式。在84个内部和外部数据集上的广泛验证表明，UniBiomed在分割、疾病识别、区域感知诊断、视觉问答及报告生成等方面均达到了最先进的性能。此外，与以往依赖临床专家预先诊断图像并手动制作精确文本或视觉提示的模型不同，UniBiomed能够为生物医学图像分析提供自动化、端到端的有依据解读。这标志着临床工作流程的一次新颖范式转变，将显著提升诊断效率。总之，UniBiomed代表了生物医学AI领域的一项新突破，解锁了强大的有依据解读能力，为更准确、高效的生物医学图像分析铺平了道路。

English

Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.

UniBiomed：面向生物医学图像解析的通用基础模型

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

摘要

Support