4RC: 어느 때나 어디서나 조건부 질의를 통한 4D 재구성

초록

우리는 단안 비디오로부터의 4D 복원을 위한 통합 피드포워드 프레임워크인 4RC를 제안한다. 일반적으로 운동을 기하학적으로 분리하거나 희소 궤적 또는 양안 장면 흐름과 같은 제한된 4D 속성만을 생성하는 기존 접근법과 달리, 4RC는 조밀한 장면 기하학과 운동 역학을 함께 포착하는 전체론적 4D 표현을 학습한다. 4RC의 핵심은 새로운 encode-once, query-anywhere 및 anytime 패러다임을 도입한 것이다: 트랜스포머 백본이 전체 비디오를 컴팩트한 시공간 잠재 공간으로 인코딩하면, 조건부 디코더가 임의의 대상 타임스탬프에서 임의의 쿼리 프레임에 대한 3D 기하학 및 운동 정보를 효율적으로 질의할 수 있다. 학습을 용이하게 하기 위해, 우리는 단일 시점 4D 속성을 기본 기하학과 시간 의존적 상대 운동으로 분해하여 최소하게 인수분해된 형태로 표현한다. 광범위한 실험을 통해 4RC가 다양한 4D 복원 작업에서 기존 및 동시대 방법들을 능가함을 입증한다.

English

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

4RC: 어느 때나 어디서나 조건부 질의를 통한 4D 재구성

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

초록

Support