4RC：基于条件查询的随时随地4D重建

摘要

我们提出4RC，一种用于单目视频四维重建的统一前馈框架。与现有方法通常将运动从几何中解耦或仅生成稀疏轨迹、双视角场景流等有限四维属性不同，4RC通过学习整体性四维表征，联合捕捉稠密场景几何与运动动态。其核心创新在于引入"一次编码、任意时空查询"的全新范式：Transformer主干网络将整个视频编码为紧凑的时空潜空间，条件解码器可从中高效查询任意目标时间戳下任意帧的三维几何与运动信息。为优化学习过程，我们采用最小分解形式表示每视角四维属性，将其解构为基础几何和时序相关相对运动。大量实验表明，4RC在多种四维重建任务中均优于现有及同期方法。

English

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

4RC：基于条件查询的随时随地4D重建

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

摘要

Support