MotionCrafter:基于四维变分自编码器的稠密几何与运动重建系统
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
February 9, 2026
作者: Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng
cs.AI
摘要
我们推出MotionCrafter——一个基于视频扩散的框架,能够从单目视频中联合重建四维几何并估计稠密运动。该方法的核心在于提出了一种在共享坐标系下联合表示稠密三维点云与三维场景流的新范式,以及一个能有效学习该表征的新型四维变分自编码器。与先前研究强制三维数值和潜变量严格对齐RGB-VAE潜空间(尽管二者分布本质不同)的做法不同,我们证明这种对齐并无必要且会导致次优性能。为此,我们引入了新的数据归一化和VAE训练策略,更好地迁移扩散先验知识,显著提升了重建质量。在多数据集上的大量实验表明,MotionCrafter在几何重建与稠密场景流估计方面均达到最先进水平,几何重建和运动重建精度分别提升38.64%和25.0%,且无需任何后优化处理。项目页面:https://ruijiezhu94.github.io/MotionCrafter_Page
English
We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page