ChatPaper.aiChatPaper

动感3转4:基于三维运动重建的四维动态合成技术

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

January 20, 2026
作者: Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, Anpei Chen
cs.AI

摘要

我们提出Motion 3-to-4——一种前馈式框架,能够通过单目视频及可选的三维参考网格生成高质量的四维动态物体。尽管二维、视频和三维内容生成技术近期取得显著进展,但由于训练数据有限以及单目视角下几何结构与运动重建的固有模糊性,四维合成仍面临挑战。Motion 3-to-4通过将四维合成解耦为静态三维形状生成与运动重建来解决这些难题。基于规范参考网格,我们的模型学习紧凑的运动潜在表征,并通过预测逐帧顶点轨迹来恢复完整且时序连贯的几何形态。可扩展的帧间变换器进一步提升了模型对可变序列长度的鲁棒性。在标准基准和具有精确真实几何的新数据集上的评估表明,Motion 3-to-4相比现有方法实现了更优的保真度与空间一致性。项目页面详见https://motion3-to-4.github.io/。
English
We present Motion 3-to-4, a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video and an optional 3D reference mesh. While recent advances have significantly improved 2D, video, and 3D content generation, 4D synthesis remains difficult due to limited training data and the inherent ambiguity of recovering geometry and motion from a monocular viewpoint. Motion 3-to-4 addresses these challenges by decomposing 4D synthesis into static 3D shape generation and motion reconstruction. Using a canonical reference mesh, our model learns a compact motion latent representation and predicts per-frame vertex trajectories to recover complete, temporally coherent geometry. A scalable frame-wise transformer further enables robustness to varying sequence lengths. Evaluations on both standard benchmarks and a new dataset with accurate ground-truth geometry show that Motion 3-to-4 delivers superior fidelity and spatial consistency compared to prior work. Project page is available at https://motion3-to-4.github.io/.
PDF01January 23, 2026