MedSAM2: 3D医療画像と動画における任意領域セグメンテーション

要旨

医用画像および動画のセグメンテーションは、精密医療において重要なタスクであり、2D画像向けのタスク特化型または汎用モデルの開発において大きな進展が見られてきました。しかし、3D画像や動画向けの汎用モデルの構築と包括的なユーザー調査に関する研究は限られています。本論文では、3D画像および動画セグメンテーションのためのプロンプト可能なセグメンテーション基盤モデルであるMedSAM2を紹介します。このモデルは、45万5千組以上の3D画像とマスクのペア、および7万6千フレームからなる大規模な医用データセットを用いてSegment Anything Model 2をファインチューニングすることで開発され、幅広い臓器、病変、および画像モダリティにおいて従来のモデルを上回る性能を発揮します。さらに、大規模データセットの作成を容易にするための人間参加型パイプラインを実装し、5,000件のCT病変、3,984件の肝臓MRI病変、および251,550フレームの心エコー動画のアノテーションを含む、これまでで最も大規模なユーザー調査を実施しました。その結果、MedSAM2が手作業のコストを85%以上削減できることが実証されました。MedSAM2はまた、ローカルおよびクラウド展開のためのユーザーフレンドリーなインターフェースを備えた広く使用されているプラットフォームに統合されており、研究および医療環境において効率的でスケーラブルかつ高品質なセグメンテーションを支援する実用的なツールとなっています。

English

Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation model for 3D image and video segmentation. The model is developed by fine-tuning the Segment Anything Model 2 on a large medical dataset with over 455,000 3D image-mask pairs and 76,000 frames, outperforming previous models across a wide range of organs, lesions, and imaging modalities. Furthermore, we implement a human-in-the-loop pipeline to facilitate the creation of large-scale datasets resulting in, to the best of our knowledge, the most extensive user study to date, involving the annotation of 5,000 CT lesions, 3,984 liver MRI lesions, and 251,550 echocardiogram video frames, demonstrating that MedSAM2 can reduce manual costs by more than 85%. MedSAM2 is also integrated into widely used platforms with user-friendly interfaces for local and cloud deployment, making it a practical tool for supporting efficient, scalable, and high-quality segmentation in both research and healthcare environments.

MedSAM2: 3D医療画像と動画における任意領域セグメンテーション

MedSAM2: Segment Anything in 3D Medical Images and Videos

要旨

Support