手术 SAM 2:通过高效帧修剪在手术视频中实时分割任何物体
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
August 15, 2024
作者: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin
cs.AI
摘要
在计算辅助手术中,手术视频分割是一项关键任务,对提高手术质量和患者预后至关重要。最近,Segment Anything Model 2(SAM2)框架在图像和视频分割方面展示出卓越的进展。然而,由于处理高分辨率图像和手术视频中复杂且远程时间动态的高计算需求,SAM2效率不高。为了解决这些挑战,我们引入了Surgical SAM 2(SurgSAM-2),这是一个先进的模型,利用了SAM2与高效帧修剪(EFP)机制,以促进实时手术视频分割。EFP机制通过有选择地保留仅最具信息量的帧来动态管理内存库,从而减少内存使用和计算成本,同时保持高分割准确性。我们的广泛实验表明,与原始SAM2相比,SurgSAM-2显著提高了效率和分割准确性。值得注意的是,SurgSAM-2在与SAM2相比,FPS提高了3倍,同时在与低分辨率数据微调后也提供了最先进的性能。这些进展将SurgSAM-2确立为手术视频分析的领先模型,使资源受限环境下的实时手术视频分割成为可能。
English
Surgical video segmentation is a critical task in computer-assisted surgery
and is vital for enhancing surgical quality and patient outcomes. Recently, the
Segment Anything Model 2 (SAM2) framework has shown superior advancements in
image and video segmentation. However, SAM2 struggles with efficiency due to
the high computational demands of processing high-resolution images and complex
and long-range temporal dynamics in surgical videos. To address these
challenges, we introduce Surgical SAM 2 (SurgSAM-2), an advanced model to
utilize SAM2 with an Efficient Frame Pruning (EFP) mechanism, to facilitate
real-time surgical video segmentation. The EFP mechanism dynamically manages
the memory bank by selectively retaining only the most informative frames,
reducing memory usage and computational cost while maintaining high
segmentation accuracy. Our extensive experiments demonstrate that SurgSAM-2
significantly improves both efficiency and segmentation accuracy compared to
the vanilla SAM2. Remarkably, SurgSAM-2 achieves a 3times FPS compared with
SAM2, while also delivering state-of-the-art performance after fine-tuning with
lower-resolution data. These advancements establish SurgSAM-2 as a leading
model for surgical video analysis, making real-time surgical video segmentation
in resource-constrained environments a feasible reality.Summary
AI-Generated Summary