Segment Anything Meets Point Tracking

要旨

Segment Anything Model（SAM）は、ポイントなどのインタラクティブなプロンプトを使用してマスクを生成する強力なゼロショット画像セグメンテーションモデルとして確立されています。本論文では、SAMの機能を動画における追跡とセグメンテーションに拡張する手法であるSAM-PTを提案します。SAM-PTは、堅牢で疎なポイント選択と伝播技術を活用してマスクを生成し、SAMベースのセグメンテーショントラッカーがDAVIS、YouTube-VOS、MOSEなどの人気のあるビデオオブジェクトセグメンテーションベンチマークで強力なゼロショット性能を発揮することを実証しています。従来のオブジェクト中心のマスク伝播戦略と比較して、我々はオブジェクトの意味論に依存しない局所構造情報を活用するためにポイント伝播を独自に使用します。ゼロショットのオープンワールドベンチマークであるUnidentified Video Objects（UVO）での直接評価を通じて、ポイントベースの追跡の利点を強調します。さらに、アプローチを強化するために、K-Medoidsクラスタリングをポイント初期化に利用し、ターゲットオブジェクトを明確に区別するためにポジティブとネガティブの両方のポイントを追跡します。また、マスクの精緻化のために複数のマスクデコードパスを採用し、追跡精度を向上させるためのポイント再初期化戦略を考案します。我々のコードは、異なるポイントトラッカーとビデオセグメンテーションベンチマークを統合し、https://github.com/SysCV/sam-pt で公開されます。

English

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, employing interactive prompts such as points to generate masks. This paper presents SAM-PT, a method extending SAM's capability to tracking and segmenting anything in dynamic videos. SAM-PT leverages robust and sparse point selection and propagation techniques for mask generation, demonstrating that a SAM-based segmentation tracker can yield strong zero-shot performance across popular video object segmentation benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional object-centric mask propagation strategies, we uniquely use point propagation to exploit local structure information that is agnostic to object semantics. We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further enhance our approach, we utilize K-Medoids clustering for point initialization and track both positive and negative points to clearly distinguish the target object. We also employ multiple mask decoding passes for mask refinement and devise a point re-initialization strategy to improve tracking accuracy. Our code integrates different point trackers and video segmentation benchmarks and will be released at https://github.com/SysCV/sam-pt.