任務分割與點追蹤的結合

摘要

Segment Anything Model（SAM）已被確立為一個強大的零樣本圖像分割模型，利用互動提示，如點來生成遮罩。本文提出SAM-PT，一種擴展SAM能力以追踪和分割動態視頻中任何物體的方法。SAM-PT利用強大且稀疏的點選擇和傳播技術進行遮罩生成，展示了基於SAM的分割追踪器可以在流行的視頻對象分割基準上取得強大的零樣本性能，包括DAVIS、YouTube-VOS和MOSE。與傳統的以對象為中心的遮罩傳播策略相比，我們獨特地使用點傳播來利用與對象語義無關的局部結構信息。我們通過直接在零樣本開放世界未識別視頻對象（UVO）基準上進行評估，突出了基於點的追踪的優點。為了進一步增強我們的方法，我們利用K-Medoids聚類進行點初始化，並跟踪正負點以清晰區分目標對象。我們還採用多次遮罩解碼過程進行遮罩精煉，並設計了點重新初始化策略以提高追踪準確性。我們的代碼集成了不同的點追踪器和視頻分割基準，將在https://github.com/SysCV/sam-pt 上發布。

English

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, employing interactive prompts such as points to generate masks. This paper presents SAM-PT, a method extending SAM's capability to tracking and segmenting anything in dynamic videos. SAM-PT leverages robust and sparse point selection and propagation techniques for mask generation, demonstrating that a SAM-based segmentation tracker can yield strong zero-shot performance across popular video object segmentation benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional object-centric mask propagation strategies, we uniquely use point propagation to exploit local structure information that is agnostic to object semantics. We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further enhance our approach, we utilize K-Medoids clustering for point initialization and track both positive and negative points to clearly distinguish the target object. We also employ multiple mask decoding passes for mask refinement and devise a point re-initialization strategy to improve tracking accuracy. Our code integrates different point trackers and video segmentation benchmarks and will be released at https://github.com/SysCV/sam-pt.

任務分割與點追蹤的結合

Segment Anything Meets Point Tracking

摘要

Support