ChatPaper.aiChatPaper

基于单张图像的4D合成:联合三维几何重建与运动生成

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

December 4, 2025
作者: Yanran Zhang, Ziyi Wang, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
cs.AI

摘要

从单张静态图像生成交互式动态四维场景仍是核心挑战。现有"生成后重建"与"重建后生成"方法大多将几何与运动解耦,导致时空不一致和泛化能力差。为此,我们扩展重建后生成框架,提出运动生成与几何重建联合的4D合成方法MoRe4D。我们首先构建包含6万段密集点轨迹视频的大规模数据集TrajScene-60K,以解决高质量四维场景数据稀缺问题。基于此,我们提出基于扩散模型的四维场景轨迹生成器(4D-STraG),联合生成几何一致且运动合理的四维点轨迹。为利用单视图先验,我们设计了深度引导的运动归一化策略和运动感知模块,实现几何与动态特征的有效融合。随后提出四维视角合成模块(4D-ViSM),可从四维点轨迹表示中渲染任意相机轨迹的视频。实验表明,MoRe4D能够从单张图像生成具有多视角一致性和丰富动态细节的高质量四维场景。代码地址:https://github.com/Zhangyr2022/MoRe4D。
English
Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geometry from motion, causing spatiotemporal inconsistencies and poor generalization. To address these, we extend the reconstruct-then-generate framework to jointly perform Motion generation and geometric Reconstruction for 4D Synthesis (MoRe4D). We first introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories, addressing the scarcity of high-quality 4D scene data. Based on this, we propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D point trajectories. To leverage single-view priors, we design a depth-guided motion normalization strategy and a motion-aware module for effective geometry and dynamics integration. We then propose a 4D View Synthesis Module (4D-ViSM) to render videos with arbitrary camera trajectories from 4D point track representations. Experiments show that MoRe4D generates high-quality 4D scenes with multi-view consistency and rich dynamic details from a single image. Code: https://github.com/Zhangyr2022/MoRe4D.
PDF152December 9, 2025