点追踪的局部全对应

摘要

我们介绍了 LocoTrack，这是一个专为跟踪视频序列中任意点（TAP）而设计的高精度高效的模型。先前在这一任务中的方法通常依赖于局部2D相关性图，以建立查询图像中的点与目标图像中局部区域之间的对应关系，然而这种方法常常在处理均匀区域或重复特征时遇到困难，导致匹配的模糊性。LocoTrack通过一种新颖的方法克服了这一挑战，该方法利用区域间的全对应关系，即局部4D相关性，建立精确的对应关系，双向对应和匹配平滑显著增强了对模糊性的鲁棒性。我们还结合了一个轻量级相关性编码器以增强计算效率，以及一个紧凑的Transformer架构来整合长期时间信息。LocoTrack在所有TAP-Vid基准测试中实现了无与伦比的准确性，并且运行速度几乎比当前最先进技术快了近6倍。

English

We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching ambiguities. LocoTrack overcomes this challenge with a novel approach that utilizes all-pair correspondences across regions, i.e., local 4D correlation, to establish precise correspondences, with bidirectional correspondence and matching smoothness significantly enhancing robustness against ambiguities. We also incorporate a lightweight correlation encoder to enhance computational efficiency, and a compact Transformer architecture to integrate long-term temporal information. LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.

点追踪的局部全对应

Local All-Pair Correspondence for Point Tracking

摘要

Support