UniBiomed: 基礎的医療画像解釈のためのユニバーサル基盤モデル

要旨

生体医用画像のマルチモーダル解釈は、生体医用画像分析において新たな可能性を切り開きます。従来のAIアプローチでは、臨床テキスト生成のための大規模言語モデル（LLM）と対象抽出のためのセグメンテーションモデルを別々に学習させることで、現実世界での柔軟な展開が難しく、生体医用情報を包括的に活用できないという課題がありました。この問題を解決するため、我々は初の汎用基盤モデルであるUniBiomedを提案します。UniBiomedは、マルチモーダル大規模言語モデル（MLLM）とSegment Anything Model（SAM）を新たに統合したもので、臨床テキストの生成と対応する生体医用オブジェクトのセグメンテーションを効果的に統合し、根拠に基づいた解釈を実現します。これにより、UniBiomedは10種類の多様な生体医用画像モダリティにわたる幅広いタスクに対応可能です。UniBiomedを開発するために、我々は10種類の画像モダリティにわたる2,700万以上の画像、アノテーション、テキスト記述からなる大規模データセットを構築しました。84の内部および外部データセットを用いた広範な検証により、UniBiomedがセグメンテーション、疾患認識、領域認識診断、視覚的質問応答、レポート生成において最先端の性能を達成することが示されました。さらに、従来のモデルとは異なり、臨床専門家による画像の事前診断や正確なテキスト・視覚プロンプトの手動作成に依存せず、生体医用画像分析のための自動化されたエンドツーエンドの根拠に基づいた解釈を提供できます。これは臨床ワークフローにおける新たなパラダイムシフトを表し、診断効率を大幅に向上させるものです。要約すると、UniBiomedは生体医用AIにおける新たなブレークスルーであり、より正確で効率的な生体医用画像分析のための強力な根拠に基づいた解釈能力を解き放つものです。

English

Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.

UniBiomed: 基礎的医療画像解釈のためのユニバーサル基盤モデル

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

要旨

Support