4D-LRM:面向任意视角与任意时刻的大规模时空重建模型
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
June 23, 2025
作者: Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, Mohit Bansal, Joyce Chai, Hao Tan
cs.AI
摘要
我们能否通过扩展四维预训练来学习通用的时空表征,从而从某些时刻的少量视角重建物体,实现任意时刻任意视角的渲染?我们给出了肯定的答案——4D-LRM,这是首个大规模四维重建模型,它能够处理无约束视角和时间戳的输入,并渲染出任意新颖的视角-时间组合。与以往基于优化、几何或生成的四维方法不同,这些方法在效率、泛化能力或保真度方面存在局限,而4D-LRM学习了一种统一的时空表征,直接从跨时间的姿态图像标记中预测每像素的四维高斯基元,理论上支持无限帧率下的快速高质量渲染。我们的结果表明,扩展时空预训练能够实现精确且高效的四维重建。4D-LRM不仅能够泛化到新物体,还能在时间上进行插值,并处理多样化的相机设置。在单块A100 GPU上,它仅需一次前向传播,不到1.5秒即可重建24帧序列。
English
Can we scale 4D pretraining to learn general space-time representations that
reconstruct an object from a few views at some times to any view at any time?
We provide an affirmative answer with 4D-LRM, the first large-scale 4D
reconstruction model that takes input from unconstrained views and timestamps
and renders arbitrary novel view-time combinations. Unlike prior 4D approaches,
e.g., optimization-based, geometry-based, or generative, that struggle with
efficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time
representation and directly predicts per-pixel 4D Gaussian primitives from
posed image tokens across time, enabling fast, high-quality rendering at, in
principle, infinite frame rate. Our results demonstrate that scaling
spatiotemporal pretraining enables accurate and efficient 4D reconstruction. We
show that 4D-LRM generalizes to novel objects, interpolates across time, and
handles diverse camera setups. It reconstructs 24-frame sequences in one
forward pass with less than 1.5 seconds on a single A100 GPU.