すべてを一度に追跡する

要旨

本論文では、ビデオシーケンスから密で長距離の動きを推定するための新しいテスト時最適化手法を提案する。従来のオプティカルフローや粒子ビデオトラッキングアルゴリズムは、通常、限られた時間ウィンドウ内で動作し、オクルージョンを介したトラッキングや推定された動きの軌跡のグローバルな一貫性の維持に苦労している。我々は、ビデオ内のすべてのピクセルの正確で全長の動き推定を可能にする、OmniMotionと名付けた完全かつグローバルに一貫した動き表現を提案する。OmniMotionは、準3D正準ボリュームを使用してビデオを表現し、局所空間と正準空間の間の全単射を介してピクセル単位のトラッキングを行う。この表現により、グローバルな一貫性を確保し、オクルージョンを介したトラッキングを行い、カメラとオブジェクトの動きの任意の組み合わせをモデル化することが可能となる。TAP-Vidベンチマークおよび実世界の映像に対する広範な評価により、我々の手法が従来の最先端手法を量的および質的に大きく上回ることが示された。詳細な結果についてはプロジェクトページを参照されたい: http://omnimotion.github.io/

English

We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbed OmniMotion, that allows for accurate, full-length motion estimation of every pixel in a video. OmniMotion represents a video using a quasi-3D canonical volume and performs pixel-wise tracking via bijections between local and canonical space. This representation allows us to ensure global consistency, track through occlusions, and model any combination of camera and object motion. Extensive evaluations on the TAP-Vid benchmark and real-world footage show that our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively. See our project page for more results: http://omnimotion.github.io/

すべてを一度に追跡する

Tracking Everything Everywhere All at Once

要旨

Support