TAPTRv2:基於注意力的位置更新改善追蹤任意點
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
July 23, 2024
作者: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang
cs.AI
摘要
本文介紹了TAPTRv2,這是一種基於TAPTR的基於Transformer的方法,用於解決追蹤任意點(TAP)任務。TAPTR借鑒了DEtection TRansformer(DETR)的設計,將每個追蹤點定義為一個點查詢,從而可以利用DETR類算法中研究良好的操作。TAPTRv2通過解決一個關於其依賴成本體積的關鍵問題來改進TAPTR,這會污染點查詢的內容特徵,並對能見度預測和成本體積計算產生負面影響。在TAPTRv2中,我們提出了一種新的基於注意力的位置更新(APU)操作,並使用鍵感知變形注意力來實現。對於每個查詢,此操作使用鍵感知注意權重來結合它們對應的可變形採樣位置,以預測新的查詢位置。這種設計基於一個觀察結果,即局部注意力本質上與成本體積相同,兩者都是通過查詢與其周圍特徵之間的點積運算來計算的。通過引入這個新操作,TAPTRv2不僅消除了成本體積計算的額外負擔,還帶來了顯著的性能改進。TAPTRv2超越了TAPTR,在許多具有挑戰性的數據集上實現了最先進的性能,展示了其優越性。
English
In this paper, we present TAPTRv2, a Transformer-based approach built upon
TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from
DEtection TRansformer (DETR) and formulates each tracking point as a point
query, making it possible to leverage well-studied operations in DETR-like
algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its
reliance on cost-volume,which contaminates the point query\'s content feature
and negatively impacts both visibility prediction and cost-volume computation.
In TAPTRv2, we propose a novel attention-based position update (APU) operation
and use key-aware deformable attention to realize. For each query, this
operation uses key-aware attention weights to combine their corresponding
deformable sampling positions to predict a new query position. This design is
based on the observation that local attention is essentially the same as
cost-volume, both of which are computed by dot-production between a query and
its surrounding features. By introducing this new operation, TAPTRv2 not only
removes the extra burden of cost-volume computation, but also leads to a
substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves
state-of-the-art performance on many challenging datasets, demonstrating the
superioritySummary
AI-Generated Summary