DELTA: どんなビデオにも適した密な効率的な長距離3Dトラッキング

要旨

モノクルビデオからの密な3Dモーションの追跡は、特に長いシーケンスでピクセルレベルの精度を目指す場合には依然として難しい課題です。私たちは、全ビデオ全体で正確なモーション推定を可能にする3D空間のすべてのピクセルを効率的に追跡する新しい手法である\アプローチを紹介します。当手法は、低解像度の追跡のための共同グローバル・ローカルアテンションメカニズムを活用し、高解像度の予測を実現するためにトランスフォーマーベースのアップサンプラーを使用しています。計算効率の低さやスパースな追跡に制限される既存の手法とは異なり、\アプローチは、8倍速く前の手法よりも高い精度を達成しながら、スケールで密な3D追跡を提供します。さらに、深度表現が追跡性能に与える影響を探究し、最適な選択肢として対数深度を特定しています。幅広い実験により、\アプローチの優位性が複数のベンチマークで示され、2Dおよび3Dの密な追跡タスクの両方で新たな最先端の結果が達成されました。当手法は、3D空間での微細で長期的なモーショントラッキングが必要なアプリケーションに対する堅牢なソリューションを提供します。

English

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce \Approach, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, \Approach delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of \Approach on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

DELTA: どんなビデオにも適した密な効率的な長距離3Dトラッキング

DELTA: Dense Efficient Long-range 3D Tracking for any video

要旨

Support