UniBiomed: 근거 기반 생체의학 이미지 해석을 위한 범용 기초 모델

초록

바이오메디컬 이미지의 다중 모달 해석은 바이오메디컬 이미지 분석에 새로운 기회를 열어줍니다. 기존의 AI 접근 방식은 일반적으로 분리된 학습에 의존해왔는데, 즉 임상 텍스트 생성을 위한 대형 언어 모델(LLM)과 대상 추출을 위한 세그멘테이션 모델을 별도로 사용함으로써 현실 세계에서의 유연한 배포가 어렵고, 종합적인 바이오메디컬 정보를 활용하지 못하는 한계가 있었습니다. 이를 해결하기 위해, 우리는 근거 기반 바이오메디컬 이미지 해석을 위한 최초의 범용 파운데이션 모델인 UniBiomed을 소개합니다. UniBiomed은 다중 모달 대형 언어 모델(MLLM)과 Segment Anything Model(SAM)의 새로운 통합을 기반으로 하여, 임상 텍스트 생성과 해당 바이오메디컬 객체의 세그멘테이션을 효과적으로 통합하여 근거 기반 해석을 가능하게 합니다. 이를 통해 UniBiomed은 10가지 다양한 바이오메디컬 이미징 모달리티에 걸쳐 광범위한 바이오메디컬 작업을 처리할 수 있습니다. UniBiomed을 개발하기 위해, 우리는 10가지 이미징 모달리티에 걸쳐 2,700만 개 이상의 이미지, 주석 및 텍스트 설명으로 구성된 대규모 데이터셋을 구축했습니다. 84개의 내부 및 외부 데이터셋에 대한 광범위한 검증을 통해 UniBiomed이 세그멘테이션, 질병 인식, 영역 인식 진단, 시각적 질문 응답 및 보고서 생성에서 최첨단 성능을 달성함을 입증했습니다. 더욱이, 이전 모델들이 임상 전문가가 이미지를 사전 진단하고 정확한 텍스트 또는 시각적 프롬프트를 수동으로 작성하는 데 의존했던 것과 달리, UniBiomed은 바이오메디컬 이미지 분석을 위한 자동화된 종단 간 근거 기반 해석을 제공할 수 있습니다. 이는 임상 워크플로우에서의 새로운 패러다임 전환을 나타내며, 진단 효율성을 크게 향상시킬 것입니다. 요약하자면, UniBiomed은 바이오메디컬 AI에서의 새로운 돌파구를 나타내며, 보다 정확하고 효율적인 바이오메디컬 이미지 분석을 위한 강력한 근거 기반 해석 능력을 제공합니다.

English

Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.

UniBiomed: 근거 기반 생체의학 이미지 해석을 위한 범용 기초 모델

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

초록

Support