모든 것을 어디서나 동시에 추적하기

초록

본 논문에서는 비디오 시퀀스로부터 조밀하고 장거리 모션을 추정하기 위한 새로운 테스트 타임 최적화 방법을 제안한다. 기존의 광학 흐름(optical flow) 또는 입자 비디오 추적 알고리즘은 일반적으로 제한된 시간 창 내에서 동작하며, 가림 현상을 통한 추적과 추정된 모션 궤적의 전역적 일관성 유지에 어려움을 겪는다. 우리는 OmniMotion이라 명명된 완전하고 전역적으로 일관된 모션 표현을 제안하며, 이를 통해 비디오 내 모든 픽셀의 정확한 전체 길이 모션 추정이 가능하다. OmniMotion은 준-3D 캐노니컬 볼륨(quasi-3D canonical volume)을 사용하여 비디오를 표현하고, 로컬 공간과 캐노니컬 공간 간의 전단사(bijection)를 통해 픽셀 단위 추적을 수행한다. 이 표현은 전역적 일관성을 보장하고, 가림 현상을 통한 추적을 가능하게 하며, 카메라와 객체 모션의 어떠한 조합도 모델링할 수 있게 한다. TAP-Vid 벤치마크와 실제 영상에 대한 광범위한 평가를 통해, 우리의 접근 방식이 양적 및 질적으로 기존의 최신 방법들을 큰 차이로 능가함을 보여준다. 더 많은 결과는 프로젝트 페이지(http://omnimotion.github.io/)에서 확인할 수 있다.

English

We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbed OmniMotion, that allows for accurate, full-length motion estimation of every pixel in a video. OmniMotion represents a video using a quasi-3D canonical volume and performs pixel-wise tracking via bijections between local and canonical space. This representation allows us to ensure global consistency, track through occlusions, and model any combination of camera and object motion. Extensive evaluations on the TAP-Vid benchmark and real-world footage show that our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively. See our project page for more results: http://omnimotion.github.io/

모든 것을 어디서나 동시에 추적하기

Tracking Everything Everywhere All at Once

초록

Support