UniBiomed：一個用於基礎生物醫學影像解讀的通用基礎模型

摘要

多模態生物醫學影像解譯為生物醫學影像分析開闢了新的機遇。傳統的人工智慧方法通常依賴於分離式訓練，即使用大型語言模型（LLMs）進行臨床文本生成，以及分割模型進行目標提取，這導致了現實世界部署的僵化，並未能充分利用整體的生物醫學資訊。為此，我們引入了UniBiomed，這是首個用於基於生物醫學影像解譯的通用基礎模型。UniBiomed基於多模態大型語言模型（MLLM）與Segment Anything Model（SAM）的新穎整合，有效統一了臨床文本生成與相應生物醫學物體的分割，實現了基於影像的全面解譯。通過這種方式，UniBiomed能夠應對跨越十種不同生物醫學影像模態的廣泛任務。為開發UniBiomed，我們精心策劃了一個大規模數據集，包含超過2700萬個影像、註釋與文本描述的三元組，涵蓋十種影像模態。在84個內部與外部數據集上的廣泛驗證表明，UniBiomed在分割、疾病識別、區域感知診斷、視覺問答及報告生成等方面達到了業界領先水平。此外，與以往依賴臨床專家預先診斷影像並手動製作精確文本或視覺提示的模型不同，UniBiomed能夠為生物醫學影像分析提供自動化且端到端的基於影像的解譯。這標誌著臨床工作流程中的一種新範式轉變，將顯著提升診斷效率。總之，UniBiomed代表了生物醫學人工智慧領域的一項新突破，釋放了強大的基於影像的解譯能力，為更精確、高效的生物醫學影像分析鋪平了道路。

English

Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.

UniBiomed：一個用於基礎生物醫學影像解讀的通用基礎模型

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

摘要

Support