4RC：いつでもどこでも条件付きクエリによる4D再構築

要旨

本論文では、単眼ビデオからの4次元再構成のための統一的フィードフォワードフレームワークである4RCを提案する。従来手法が典型的に運動を形状から分離するか、まばらな軌跡や2視点シーンフローといった限定的な4D属性しか生成しないのに対し、4RCは密なシーン形状と運動ダイナミクスを統合的に捕捉する包括的4D表現を学習する。中核となるのは、革新的な「一度エンコード、任意時刻・任意クエリ」パラダイムである。トランスフォーマーバックボーンがビデオ全体をコンパクトな時空間潜在空間にエンコードし、条件付きデコーダが任意の目標時刻におけるクエリフレームの3D形状と運動を効率的に問い合わせる。学習を促進するため、ビュー毎の4D属性を基本形状と時間依存の相対運動に分解し、最小限に因子化された形式で表現する。大規模な実験により、4RCが多様な4D再構成タスクにおいて従来手法及び同時期手法を凌駕することを実証する。

English

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

4RC：いつでもどこでも条件付きクエリによる4D再構築

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

要旨

Support