分段一切与点追踪
Segment Anything Meets Point Tracking
July 3, 2023
作者: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu
cs.AI
摘要
分段任意模型(SAM)已成为一种强大的零样本图像分割模型,它通过点等交互式提示来生成掩码。本文提出SAM-PT方法,将SAM的能力扩展至动态视频中的目标跟踪与分割。SAM-PT采用鲁棒的稀疏点选择与传播技术进行掩码生成,实验表明基于SAM的分割跟踪器在DAVIS、YouTube-VOS和MOSE等主流视频目标分割基准上能实现优异的零样本性能。相较于传统以目标为中心的掩码传播策略,我们创新性地通过点传播技术利用与目标语义无关的局部结构信息。通过在零样本开放世界未识别视频对象(UVO)基准上的直接评估,我们凸显了基于点跟踪方法的优势。为进一步优化方案,我们采用K-Medoids聚类进行点初始化,同时跟踪正负样本点以清晰区分目标对象。此外,通过多轮掩码解码实现掩码优化,并设计点重初始化策略提升跟踪精度。我们的代码整合了多种点跟踪器与视频分割基准,将在https://github.com/SysCV/sam-pt开源。
English
The Segment Anything Model (SAM) has established itself as a powerful
zero-shot image segmentation model, employing interactive prompts such as
points to generate masks. This paper presents SAM-PT, a method extending SAM's
capability to tracking and segmenting anything in dynamic videos. SAM-PT
leverages robust and sparse point selection and propagation techniques for mask
generation, demonstrating that a SAM-based segmentation tracker can yield
strong zero-shot performance across popular video object segmentation
benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional
object-centric mask propagation strategies, we uniquely use point propagation
to exploit local structure information that is agnostic to object semantics. We
highlight the merits of point-based tracking through direct evaluation on the
zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further
enhance our approach, we utilize K-Medoids clustering for point initialization
and track both positive and negative points to clearly distinguish the target
object. We also employ multiple mask decoding passes for mask refinement and
devise a point re-initialization strategy to improve tracking accuracy. Our
code integrates different point trackers and video segmentation benchmarks and
will be released at https://github.com/SysCV/sam-pt.