MambaEVT: 状態空間モデルを用いたイベントストリームベースの視覚物体追跡

要旨

イベントカメラを用いた視覚追跡は、その独特な撮像原理と低消費電力、高ダイナミックレンジ、高時間分解能といった利点から、近年ますます注目を集めています。現在のイベントベースの追跡アルゴリズムは、ビジョントランスフォーマーの利用や静的テンプレートによる対象物体の位置特定が原因で、徐々に性能の限界に達しつつあります。本論文では、線形計算量の状態空間モデルをバックボーンネットワークとして採用した、新しいMambaベースの視覚追跡フレームワークを提案します。探索領域とターゲットテンプレートは、ビジョンMambaネットワークに入力され、特徴抽出と相互作用が同時に行われます。探索領域の出力トークンは、追跡ヘッドに入力され、ターゲットの位置特定が行われます。さらに重要な点として、Memory Mambaネットワークを使用して、動的テンプレート更新戦略を追跡フレームワークに導入することを検討しています。ターゲットテンプレートライブラリ内のサンプルの多様性を考慮し、テンプレートメモリモジュールに適切な調整を加えることで、より効果的な動的テンプレートを統合できます。動的テンプレートと静的テンプレートの効果的な組み合わせにより、我々のMambaベースの追跡アルゴリズムは、EventVOT、VisEvent、FE240hzなどの大規模データセットにおいて、精度と計算コストの良いバランスを達成します。ソースコードはhttps://github.com/Event-AHU/MambaEVTで公開されます。

English

Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object localization. In this paper, we propose a novel Mamba-based visual tracking framework that adopts the state space model with linear complexity as a backbone network. The search regions and target template are fed into the vision Mamba network for simultaneous feature extraction and interaction. The output tokens of search regions will be fed into the tracking head for target localization. More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network. By considering the diversity of samples in the target template library and making appropriate adjustments to the template memory module, a more effective dynamic template can be integrated. The effective combination of dynamic and static templates allows our Mamba-based tracking algorithm to achieve a good balance between accuracy and computational cost on multiple large-scale datasets, including EventVOT, VisEvent, and FE240hz. The source code will be released on https://github.com/Event-AHU/MambaEVT

MambaEVT: 状態空間モデルを用いたイベントストリームベースの視覚物体追跡

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

要旨

Summary

Support

Support