SpatialTracker: Het volgen van willekeurige 2D-pixels in 3D-ruimte

Samenvatting

Het herstellen van dichte en langetermijn pixelbeweging in video's is een uitdagend probleem. Een deel van de moeilijkheid ontstaat door het 3D-naar-2D projectieproces, wat leidt tot occlusies en discontinuïteiten in het 2D-bewegingsdomein. Hoewel 2D-beweging complex kan zijn, stellen we dat de onderliggende 3D-beweging vaak eenvoudig en laagdimensionaal kan zijn. In dit werk stellen we voor om puntbanen in 3D-ruimte te schatten om de problemen veroorzaakt door beeldprojectie te verminderen. Onze methode, genaamd SpatialTracker, verheft 2D-pixels naar 3D met behulp van monokulaire diepteschatters, representeert de 3D-inhoud van elk frame efficiënt met behulp van een triplane-representatie, en voert iteratieve updates uit met behulp van een transformer om 3D-banen te schatten. Het volgen in 3D stelt ons in staat om as-rigid-as-possible (ARAP) beperkingen te benutten terwijl we tegelijkertijd een rigiditeitsembedding leren die pixels clustert in verschillende rigide delen. Uitgebreide evaluatie toont aan dat onze aanpak state-of-the-art trackingprestaties bereikt, zowel kwalitatief als kwantitatief, met name in uitdagende scenario's zoals out-of-plane rotatie.

English

Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection. Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators, represents the 3D content of each frame efficiently using a triplane representation, and performs iterative updates using a transformer to estimate 3D trajectories. Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts. Extensive evaluation shows that our approach achieves state-of-the-art tracking performance both qualitatively and quantitatively, particularly in challenging scenarios such as out-of-plane rotation.

SpatialTracker: Het volgen van willekeurige 2D-pixels in 3D-ruimte

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Samenvatting

Support