실시간 UAV 추적을 위한 폐색 강건 비전 트랜스포머 학습

초록

비전 트랜스포머(ViT) 백본을 사용한 단일 스트림 아키텍처는 최근 실시간 UAV 추적에서 큰 잠재력을 보여주고 있습니다. 그러나 건물이나 나무와 같은 장애물로 인한 빈번한 가림 현상은 이러한 모델들이 효과적으로 가림을 처리할 전략을 갖추지 못했다는 주요 단점을 드러냅니다. 공중 추적에서 단일 스트림 ViT 모델의 가림 내성을 강화하기 위한 새로운 방법이 필요합니다. 본 연구에서는 공간적 콕스 프로세스로 모델링된 랜덤 마스킹 연산에 대해 타겟의 특징 표현이 불변하도록 강제함으로써 UAV 추적을 위한 가림 내성 표현(Occlusion-Robust Representations, ORR)을 학습하는 방법을 제안합니다. 이 랜덤 마스킹은 타겟 가림을 근사적으로 시뮬레이션함으로써 UAV 추적에서 타겟 가림에 강인한 ViT를 학습할 수 있도록 합니다. 이 프레임워크는 ORTrack으로 명명되었습니다. 또한, 실시간 응용을 용이하게 하기 위해, 우리는 작업의 난이도에 따라 교사 모델인 ORTrack의 동작을 적응적으로 모방하는 적응형 특징 기반 지식 증류(Adaptive Feature-Based Knowledge Distillation, AFKD) 방법을 제안하여 더 컴팩트한 추적기를 생성합니다. 이 학생 모델은 ORTrack-D로 명명되었으며, ORTrack의 성능을 대부분 유지하면서 더 높은 효율성을 제공합니다. 여러 벤치마크에서의 광범위한 실험을 통해 우리 방법의 효과성을 검증하고, 최첨단 성능을 입증하였습니다. 코드는 https://github.com/wuyou3474/ORTrack에서 확인할 수 있습니다.

English

Single-stream architectures using Vision Transformer (ViT) backbones show great potential for real-time UAV tracking recently. However, frequent occlusions from obstacles like buildings and trees expose a major drawback: these models often lack strategies to handle occlusions effectively. New methods are needed to enhance the occlusion resilience of single-stream ViT models in aerial tracking. In this work, we propose to learn Occlusion-Robust Representations (ORR) based on ViTs for UAV tracking by enforcing an invariance of the feature representation of a target with respect to random masking operations modeled by a spatial Cox process. Hopefully, this random masking approximately simulates target occlusions, thereby enabling us to learn ViTs that are robust to target occlusion for UAV tracking. This framework is termed ORTrack. Additionally, to facilitate real-time applications, we propose an Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more compact tracker, which adaptively mimics the behavior of the teacher model ORTrack according to the task's difficulty. This student model, dubbed ORTrack-D, retains much of ORTrack's performance while offering higher efficiency. Extensive experiments on multiple benchmarks validate the effectiveness of our method, demonstrating its state-of-the-art performance. Codes is available at https://github.com/wuyou3474/ORTrack.

실시간 UAV 추적을 위한 폐색 강건 비전 트랜스포머 학습

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

초록

Support