任務分割與點追蹤的結合
Segment Anything Meets Point Tracking
July 3, 2023
作者: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu
cs.AI
摘要
Segment Anything Model(SAM)已被確立為一個強大的零樣本圖像分割模型,利用互動提示,如點來生成遮罩。本文提出SAM-PT,一種擴展SAM能力以追踪和分割動態視頻中任何物體的方法。SAM-PT利用強大且稀疏的點選擇和傳播技術進行遮罩生成,展示了基於SAM的分割追踪器可以在流行的視頻對象分割基準上取得強大的零樣本性能,包括DAVIS、YouTube-VOS和MOSE。與傳統的以對象為中心的遮罩傳播策略相比,我們獨特地使用點傳播來利用與對象語義無關的局部結構信息。我們通過直接在零樣本開放世界未識別視頻對象(UVO)基準上進行評估,突出了基於點的追踪的優點。為了進一步增強我們的方法,我們利用K-Medoids聚類進行點初始化,並跟踪正負點以清晰區分目標對象。我們還採用多次遮罩解碼過程進行遮罩精煉,並設計了點重新初始化策略以提高追踪準確性。我們的代碼集成了不同的點追踪器和視頻分割基準,將在https://github.com/SysCV/sam-pt 上發布。
English
The Segment Anything Model (SAM) has established itself as a powerful
zero-shot image segmentation model, employing interactive prompts such as
points to generate masks. This paper presents SAM-PT, a method extending SAM's
capability to tracking and segmenting anything in dynamic videos. SAM-PT
leverages robust and sparse point selection and propagation techniques for mask
generation, demonstrating that a SAM-based segmentation tracker can yield
strong zero-shot performance across popular video object segmentation
benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional
object-centric mask propagation strategies, we uniquely use point propagation
to exploit local structure information that is agnostic to object semantics. We
highlight the merits of point-based tracking through direct evaluation on the
zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further
enhance our approach, we utilize K-Medoids clustering for point initialization
and track both positive and negative points to clearly distinguish the target
object. We also employ multiple mask decoding passes for mask refinement and
devise a point re-initialization strategy to improve tracking accuracy. Our
code integrates different point trackers and video segmentation benchmarks and
will be released at https://github.com/SysCV/sam-pt.