SpatialTrackerV2:简化3D点追踪
SpatialTrackerV2: 3D Point Tracking Made Easy
July 16, 2025
作者: Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou
cs.AI
摘要
我们推出SpatialTrackerV2,一种面向单目视频的前馈式三维点追踪方法。不同于依赖现成组件构建的模块化三维追踪流程,我们的方法将点追踪、单目深度估计与相机姿态估计之间的内在联系统一于一个高性能的前馈式三维点追踪器中。该方法将世界空间中的三维运动分解为场景几何、相机自运动及像素级物体运动,采用全可微分且端到端的架构,支持跨多种数据集的大规模训练,包括合成序列、带姿态的RGB-D视频以及未标注的真实场景片段。通过从这类异构数据中联合学习几何与运动信息,SpatialTrackerV2在三维追踪任务上超越了现有方法30%的性能,同时与领先的动态三维重建方法精度相当,而运行速度提升了50倍。
English
We present SpatialTrackerV2, a feed-forward 3D point tracking method for
monocular videos. Going beyond modular pipelines built on off-the-shelf
components for 3D tracking, our approach unifies the intrinsic connections
between point tracking, monocular depth, and camera pose estimation into a
high-performing and feedforward 3D point tracker. It decomposes world-space 3D
motion into scene geometry, camera ego-motion, and pixel-wise object motion,
with a fully differentiable and end-to-end architecture, allowing scalable
training across a wide range of datasets, including synthetic sequences, posed
RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and
motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms
existing 3D tracking methods by 30%, and matches the accuracy of leading
dynamic 3D reconstruction approaches while running 50times faster.