TAPTRv2:基于注意力的位置更新改进了跟踪任意点
TAPTRv2: Attention-based Position Update Improves Tracking Any Point
July 23, 2024
作者: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang
cs.AI
摘要
本文介绍了TAPTRv2,这是基于TAPTR构建的一种基于Transformer的方法,用于解决跟踪任意点(TAP)任务。TAPTR借鉴了DEtection TRansformer(DETR)的设计,并将每个跟踪点形式化为一个点查询,从而可以利用DETR类算法中经过深入研究的操作。TAPTRv2通过解决一个关于其依赖成本体积的关键问题来改进TAPTR,这种依赖会污染点查询的内容特征,并对可见性预测和成本体积计算产生负面影响。在TAPTRv2中,我们提出了一种新颖的基于注意力的位置更新(APU)操作,并使用键感知可变注意力来实现。对于每个查询,该操作使用键感知注意力权重来组合它们对应的可变采样位置,以预测一个新的查询位置。这种设计基于这样一个观察:局部注意力本质上与成本体积相同,两者都是通过查询与周围特征之间的点积计算得出的。通过引入这种新操作,TAPTRv2不仅消除了成本体积计算的额外负担,还实现了显著的性能改进。TAPTRv2超越了TAPTR,在许多具有挑战性的数据集上实现了最先进的性能,展示了其优越性。
English
In this paper, we present TAPTRv2, a Transformer-based approach built upon
TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from
DEtection TRansformer (DETR) and formulates each tracking point as a point
query, making it possible to leverage well-studied operations in DETR-like
algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its
reliance on cost-volume,which contaminates the point query\'s content feature
and negatively impacts both visibility prediction and cost-volume computation.
In TAPTRv2, we propose a novel attention-based position update (APU) operation
and use key-aware deformable attention to realize. For each query, this
operation uses key-aware attention weights to combine their corresponding
deformable sampling positions to predict a new query position. This design is
based on the observation that local attention is essentially the same as
cost-volume, both of which are computed by dot-production between a query and
its surrounding features. By introducing this new operation, TAPTRv2 not only
removes the extra burden of cost-volume computation, but also leads to a
substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves
state-of-the-art performance on many challenging datasets, demonstrating the
superioritySummary
AI-Generated Summary