ChatPaper.aiChatPaper

ReSurgSAM2:通过可信长期跟踪实现手术视频中的任意目标分割

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

May 13, 2025
作者: Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin
cs.AI

摘要

手术场景分割在计算机辅助手术中至关重要,对提升手术质量和患者预后具有重大意义。近年来,参考式手术分割技术因其能为外科医生提供交互式目标分割体验的优势而崭露头角。然而,现有方法受限于低效性和短期跟踪能力,难以应对复杂现实手术场景的挑战。本文提出ReSurgSAM2,一种两阶段手术参考分割框架,该框架利用Segment Anything Model 2进行文本引导的目标检测,随后通过可靠的初始帧识别与多样性驱动的长期记忆进行跟踪。在检测阶段,我们提出了一种跨模态时空Mamba模型,以生成精确的检测与分割结果。基于这些结果,我们的可信初始帧选择策略为后续跟踪确定了可靠帧。选定初始帧后,方法转入跟踪阶段,采用多样性驱动记忆机制,维护一个可信且多样化的记忆库,确保长期跟踪的一致性。大量实验表明,ReSurgSAM2在准确性和效率上较现有方法均有显著提升,实时运行速度达到61.2 FPS。我们的代码与数据集将发布于https://github.com/jinlab-imvr/ReSurgSAM2。
English
Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicability in complex real-world surgical scenarios. In this paper, we introduce ReSurgSAM2, a two-stage surgical referring segmentation framework that leverages Segment Anything Model 2 to perform text-referred target detection, followed by tracking with reliable initial frame identification and diversity-driven long-term memory. For the detection stage, we propose a cross-modal spatial-temporal Mamba to generate precise detection and segmentation results. Based on these results, our credible initial frame selection strategy identifies the reliable frame for the subsequent tracking. Upon selecting the initial frame, our method transitions to the tracking stage, where it incorporates a diversity-driven memory mechanism that maintains a credible and diverse memory bank, ensuring consistent long-term tracking. Extensive experiments demonstrate that ReSurgSAM2 achieves substantial improvements in accuracy and efficiency compared to existing methods, operating in real-time at 61.2 FPS. Our code and datasets will be available at https://github.com/jinlab-imvr/ReSurgSAM2.

Summary

AI-Generated Summary

PDF72May 16, 2025