ChatPaper.aiChatPaper

ReSurgSAM2:基于可信长期追踪的手术视频中任意目标参考分割

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

May 13, 2025
作者: Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin
cs.AI

摘要

手術場景分割在計算機輔助手術中至關重要,對於提升手術質量和患者預後具有重大意義。近年來,基於參考的手術分割技術逐漸興起,其優勢在於能為外科醫生提供交互式體驗以分割目標對象。然而,現有方法受限於效率低下和短期追蹤,阻礙了其在複雜現實手術場景中的應用。本文提出ReSurgSAM2,這是一個兩階段的手術參考分割框架,利用Segment Anything Model 2進行文本參考的目標檢測,隨後通過可靠的初始幀識別和多樣性驅動的長期記憶進行追蹤。在檢測階段,我們提出了一種跨模態時空Mamba模型,以生成精確的檢測和分割結果。基於這些結果,我們的可信初始幀選擇策略識別出後續追蹤的可靠幀。選定初始幀後,我們的方法轉入追蹤階段,其中引入了一種多樣性驅動的記憶機制,維護一個可信且多樣的記憶庫,確保了長期追蹤的一致性。大量實驗表明,ReSurgSAM2在準確性和效率上相較現有方法取得了顯著提升,實時運行速度達61.2 FPS。我們的代碼和數據集將在https://github.com/jinlab-imvr/ReSurgSAM2上公開。
English
Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicability in complex real-world surgical scenarios. In this paper, we introduce ReSurgSAM2, a two-stage surgical referring segmentation framework that leverages Segment Anything Model 2 to perform text-referred target detection, followed by tracking with reliable initial frame identification and diversity-driven long-term memory. For the detection stage, we propose a cross-modal spatial-temporal Mamba to generate precise detection and segmentation results. Based on these results, our credible initial frame selection strategy identifies the reliable frame for the subsequent tracking. Upon selecting the initial frame, our method transitions to the tracking stage, where it incorporates a diversity-driven memory mechanism that maintains a credible and diverse memory bank, ensuring consistent long-term tracking. Extensive experiments demonstrate that ReSurgSAM2 achieves substantial improvements in accuracy and efficiency compared to existing methods, operating in real-time at 61.2 FPS. Our code and datasets will be available at https://github.com/jinlab-imvr/ReSurgSAM2.

Summary

AI-Generated Summary

PDF72May 16, 2025