SAMURAI: 運動を認識したメモリを用いたゼロショットビジュアルトラッキングのためのSegment Anything Modelの適応

要旨

Segment Anything Model 2（SAM 2）は、物体セグメンテーションタスクで強力なパフォーマンスを示していますが、特に混雑したシーンや高速移動または自己遮蔽物体を扱う際に視覚物体追跡において課題があります。さらに、元のモデルの固定ウィンドウメモリアプローチは、次のフレームの画像特徴を条件付けるために選択されたメモリの品質を考慮していないため、ビデオでのエラー伝播が起こります。本論文では、視覚物体追跡に特化したSAM 2の強化版であるSAMURAIを紹介します。提案された動きを意識したメモリ選択メカニズムと時間的な動きの手掛かりを組み込むことで、SAMURAIは効果的に物体の動きを予測し、マスク選択を洗練させ、再トレーニングや微調整を必要とせずに堅牢で正確な追跡を実現します。SAMURAIはリアルタイムで動作し、様々なベンチマークデータセットで強力なゼロショットパフォーマンスを示し、微調整なしで汎化する能力を示しています。評価では、SAMURAIは既存のトラッカーに比べて成功率と精度が著しく向上し、LaSOT_{ext}でAUCが7.1%、GOT-10kでAOが3.5%向上しています。さらに、LaSOTで完全教示法と競合する結果を達成し、複雑な追跡シナリオでの堅牢性とダイナミックな環境での実世界アプリケーションへの潜在的な可能性を強調しています。コードと結果はhttps://github.com/yangchris11/samurai で入手可能です。

English

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects. Furthermore, the fixed-window memory approach in the original model does not consider the quality of memories selected to condition the image features for the next frame, leading to error propagation in videos. This paper introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. By incorporating temporal motion cues with the proposed motion-aware memory selection mechanism, SAMURAI effectively predicts object motion and refines mask selection, achieving robust, accurate tracking without the need for retraining or fine-tuning. SAMURAI operates in real-time and demonstrates strong zero-shot performance across diverse benchmark datasets, showcasing its ability to generalize without fine-tuning. In evaluations, SAMURAI achieves significant improvements in success rate and precision over existing trackers, with a 7.1% AUC gain on LaSOT_{ext} and a 3.5% AO gain on GOT-10k. Moreover, it achieves competitive results compared to fully supervised methods on LaSOT, underscoring its robustness in complex tracking scenarios and its potential for real-world applications in dynamic environments. Code and results are available at https://github.com/yangchris11/samurai.

SAMURAI: 運動を認識したメモリを用いたゼロショットビジュアルトラッキングのためのSegment Anything Modelの適応

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

要旨

Support