空间跟踪器:在3D空间中跟踪任意2D像素
SpatialTracker: Tracking Any 2D Pixels in 3D Space
April 5, 2024
作者: Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
cs.AI
摘要
在视频中恢复密集且长距离的像素运动是一个具有挑战性的问题。部分困难源于三维到二维投影过程,导致二维运动领域出现遮挡和不连续性。虽然二维运动可能很复杂,但我们认为潜在的三维运动通常是简单且低维的。在这项工作中,我们提出通过估计三维空间中的点轨迹来缓解图像投影引起的问题。我们的方法名为空间追踪器,通过单目深度估计器将二维像素提升到三维,使用三平面表示高效地表示每帧的三维内容,并利用变换器执行迭代更新来估计三维轨迹。在三维中进行跟踪使我们能够利用尽可能刚性(ARAP)约束,同时学习将像素聚类到不同刚性部分的刚性嵌入。广泛的评估显示,我们的方法在质量和数量上都取得了最先进的跟踪性能,特别是在诸如平面外旋转等具有挑战性的场景中。
English
Recovering dense and long-range pixel motion in videos is a challenging
problem. Part of the difficulty arises from the 3D-to-2D projection process,
leading to occlusions and discontinuities in the 2D motion domain. While 2D
motion can be intricate, we posit that the underlying 3D motion can often be
simple and low-dimensional. In this work, we propose to estimate point
trajectories in 3D space to mitigate the issues caused by image projection. Our
method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth
estimators, represents the 3D content of each frame efficiently using a
triplane representation, and performs iterative updates using a transformer to
estimate 3D trajectories. Tracking in 3D allows us to leverage
as-rigid-as-possible (ARAP) constraints while simultaneously learning a
rigidity embedding that clusters pixels into different rigid parts. Extensive
evaluation shows that our approach achieves state-of-the-art tracking
performance both qualitatively and quantitatively, particularly in challenging
scenarios such as out-of-plane rotation.Summary
AI-Generated Summary