點跟踪的本地全對應
Local All-Pair Correspondence for Point Tracking
July 22, 2024
作者: Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim, Joon-Young Lee
cs.AI
摘要
我們介紹了 LocoTrack,這是一個專為跟蹤任意點(TAP)於視頻序列任務而設計的高精確度和高效率模型。先前在這項任務中的方法通常依賴於本地2D相關性地圖,以建立從查詢圖像中的一點到目標圖像中的本地區域的對應,但往往在處理均質區域或重複特徵時遇到困難,導致匹配的模棱兩可。LocoTrack通過一種新穎的方法克服了這個挑戰,該方法利用區域間的全對應,即本地4D相關性,來建立精確的對應,雙向對應和匹配平滑明顯增強了對模棱兩可的魯棒性。我們還將輕量級相關編碼器和緊湊的Transformer架構納入,以整合長期時間信息。LocoTrack在所有TAP-Vid基準測試中實現了無與倫比的準確性,並且運行速度幾乎比當前最先進的方法快了近6倍。
English
We introduce LocoTrack, a highly accurate and efficient model designed for
the task of tracking any point (TAP) across video sequences. Previous
approaches in this task often rely on local 2D correlation maps to establish
correspondences from a point in the query image to a local region in the target
image, which often struggle with homogeneous regions or repetitive features,
leading to matching ambiguities. LocoTrack overcomes this challenge with a
novel approach that utilizes all-pair correspondences across regions, i.e.,
local 4D correlation, to establish precise correspondences, with bidirectional
correspondence and matching smoothness significantly enhancing robustness
against ambiguities. We also incorporate a lightweight correlation encoder to
enhance computational efficiency, and a compact Transformer architecture to
integrate long-term temporal information. LocoTrack achieves unmatched accuracy
on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than
the current state-of-the-art.Summary
AI-Generated Summary