SpatialTrackerV2：簡化三維點追蹤

摘要

我們推出SpatialTrackerV2，這是一種針對單目視頻的前饋式三維點追蹤方法。與基於現成組件構建的模塊化三維追蹤流程不同，我們的方法將點追蹤、單目深度估計和相機姿態估計之間的內在聯繫統一為一個高性能的前饋式三維點追蹤器。它將世界空間中的三維運動分解為場景幾何、相機自運動和像素級物體運動，採用完全可微分且端到端的架構，使得能夠在包括合成序列、帶姿態的RGB-D視頻以及未標記的野外片段在內的多樣數據集上進行可擴展的訓練。通過從此類異構數據中聯合學習幾何與運動，SpatialTrackerV2在性能上超越了現有的三維追蹤方法30%，並在保持領先動態三維重建方法精度的同時，運行速度提升了50倍。

English

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50times faster.

SpatialTrackerV2：簡化三維點追蹤

SpatialTrackerV2: 3D Point Tracking Made Easy

摘要

Support