MambaEVT: 상태 공간 모델을 사용한 이벤트 스트림 기반 시각 객체 추적

초록

최근 몇 년간 이벤트 카메라 기반 시각 추적은 독특한 이미징 원리와 저 에너지 소비, 높은 다이내믹 범위, 그리고 밀도 높은 시간 해상도의 장점으로 인해 점점 더 많은 관심을 끌고 있습니다. 현재의 이벤트 기반 추적 알고리즘은 시각 Transformer의 활용과 대상 객체 위치 지정을 위한 정적 템플릿으로 인해 성능 병목 현상에 점차 직면하고 있습니다. 본 논문에서는 선형 복잡도를 가진 상태 공간 모델을 백본 네트워크로 채택하는 새로운 Mamba 기반 시각 추적 프레임워크를 제안합니다. 탐색 영역과 대상 템플릿은 시각 Mamba 네트워크로 전달되어 동시에 특징 추출과 상호 작용이 이루어집니다. 탐색 영역의 출력 토큰은 대상 위치 지정을 위해 추적 헤드로 전달됩니다. 더 중요한 것은, 메모리 Mamba 네트워크를 사용하여 추적 프레임워크에 동적 템플릿 업데이트 전략을 도입하는 것을 고려합니다. 대상 템플릿 라이브러리의 샘플 다양성을 고려하고 템플릿 메모리 모듈에 적절한 조정을 가하여 보다 효과적인 동적 템플릿을 통합할 수 있습니다. 동적 및 정적 템플릿의 효과적인 결합을 통해 Mamba 기반 추적 알고리즘은 EventVOT, VisEvent, 그리고 FE240hz를 포함한 여러 대규모 데이터셋에서 정확성과 계산 비용 사이의 좋은 균형을 달성할 수 있습니다. 소스 코드는 https://github.com/Event-AHU/MambaEVT에서 공개될 예정입니다.

English

Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object localization. In this paper, we propose a novel Mamba-based visual tracking framework that adopts the state space model with linear complexity as a backbone network. The search regions and target template are fed into the vision Mamba network for simultaneous feature extraction and interaction. The output tokens of search regions will be fed into the tracking head for target localization. More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network. By considering the diversity of samples in the target template library and making appropriate adjustments to the template memory module, a more effective dynamic template can be integrated. The effective combination of dynamic and static templates allows our Mamba-based tracking algorithm to achieve a good balance between accuracy and computational cost on multiple large-scale datasets, including EventVOT, VisEvent, and FE240hz. The source code will be released on https://github.com/Event-AHU/MambaEVT

MambaEVT: 상태 공간 모델을 사용한 이벤트 스트림 기반 시각 객체 추적

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

초록

Support