ChatPaper.aiChatPaper

V-DPM:基于动态点云地图的四维视频重建

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

January 14, 2026
作者: Edgar Sucar, Eldar Insafutdinov, Zihang Lai, Andrea Vedaldi
cs.AI

摘要

诸如DUSt3R不变点图这类强大的三维表征技术,通过编码三维形状与相机参数,显著推进了前馈式三维重建的发展。虽然点图技术默认处理静态场景,但动态点图(DPM)通过额外表征场景运动,将这一概念拓展至动态三维内容。然而现有DPM仅适用于图像对,且与DUSt3R类似,在处理超过两个视角时仍需通过优化进行后处理。我们认为将DPM应用于视频场景更具实用价值,并由此提出V-DPM予以验证。首先,我们阐释如何构建适用于视频输入的DPM框架,以实现表征能力最大化、神经网络预测便捷化及预训练模型复用化。其次,基于近期强大的三维重建器VGGT实现这些构想。尽管VGGT原针对静态场景训练,但我们证明仅需适量合成数据即可将其转化为高效的V-DPM预测器。本方法在动态场景的三维与四维重建中达到业界最优性能。特别值得注意的是,相较于VGGT近期动态扩展方案(如P3),DPM不仅能重建动态深度,还能完整还原场景中每个点的三维运动轨迹。
English
Powerful 3D representations such as DUSt3R invariant point maps, which encode 3D shape and camera parameters, have significantly advanced feed forward 3D reconstruction. While point maps assume static scenes, Dynamic Point Maps (DPMs) extend this concept to dynamic 3D content by additionally representing scene motion. However, existing DPMs are limited to image pairs and, like DUSt3R, require post processing via optimization when more than two views are involved. We argue that DPMs are more useful when applied to videos and introduce V-DPM to demonstrate this. First, we show how to formulate DPMs for video input in a way that maximizes representational power, facilitates neural prediction, and enables reuse of pretrained models. Second, we implement these ideas on top of VGGT, a recent and powerful 3D reconstructor. Although VGGT was trained on static scenes, we show that a modest amount of synthetic data is sufficient to adapt it into an effective V-DPM predictor. Our approach achieves state of the art performance in 3D and 4D reconstruction for dynamic scenes. In particular, unlike recent dynamic extensions of VGGT such as P3, DPMs recover not only dynamic depth but also the full 3D motion of every point in the scene.
PDF21January 17, 2026