ChatPaper.aiChatPaper

空間追蹤器:在3D空間中追蹤任意2D像素

SpatialTracker: Tracking Any 2D Pixels in 3D Space

April 5, 2024
作者: Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
cs.AI

摘要

在影片中恢復密集且長距離的像素運動是一個具有挑戰性的問題。部分困難來自於3D轉2D投影過程,導致2D運動領域中的遮蔽和不連續性。雖然2D運動可能復雜,但我們認為潛在的3D運動通常是簡單且低維的。在這項研究中,我們提出通過估計3D空間中的點軌跡來緩解圖像投影引起的問題。我們的方法名為「空間追蹤器」,使用單眼深度估算器將2D像素提升到3D,使用三平面表示有效地表示每個幀的3D內容,並使用變換器執行迭代更新以估算3D軌跡。在3D中進行追蹤使我們能夠利用盡可能剛性(ARAP)約束,同時學習將像素聚類到不同剛性部分的剛性嵌入。廣泛的評估顯示,我們的方法在質量和量化方面均實現了最先進的追蹤性能,特別是在具有挑戰性的情況下,如平面外旋轉。
English
Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection. Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators, represents the 3D content of each frame efficiently using a triplane representation, and performs iterative updates using a transformer to estimate 3D trajectories. Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts. Extensive evaluation shows that our approach achieves state-of-the-art tracking performance both qualitatively and quantitatively, particularly in challenging scenarios such as out-of-plane rotation.

Summary

AI-Generated Summary

PDF261December 15, 2024