SAM-Body4D:无需训练的4D人体网格视频重建技术
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
December 9, 2025
作者: Mingqi Gao, Yunqi Miao, Jungong Han
cs.AI
摘要
人体网格恢复(HMR)旨在从二维观测数据中重建三维人体姿态与形状,是现实场景中人本理解的基础技术。虽然当前基于图像的HMR方法(如SAM 3D Body)在自然场景图像上展现出强大鲁棒性,但在处理视频时依赖逐帧推理,会导致时间不一致性且在遮挡情况下性能下降。我们通过利用视频中人体运动的连续性,在不额外训练的前提下解决了这些问题。本文提出SAM-Body4D——一种无需训练的视频时序一致且抗遮挡的HMR框架。我们首先通过可提示视频分割模型生成身份一致的掩码片段,继而利用遮挡感知模块修复缺失区域。优化后的掩码片段引导SAM 3D Body生成连贯的全身体网格轨迹,而基于填充的并行化策略实现了高效的多人体推理。实验结果表明,SAM-Body4D在具有挑战性的自然场景视频中显著提升了时间稳定性与鲁棒性,且无需重新训练。代码与演示见:https://github.com/gaomingqi/sam-body4d。
English
Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images, they rely on per-frame inference when applied to videos, leading to temporal inconsistency and degraded performance under occlusions. We address these issues without extra training by leveraging the inherent human continuity in videos. We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos. We first generate identity-consistent masklets using a promptable video segmentation model, then refine them with an Occlusion-Aware module to recover missing regions. The refined masklets guide SAM 3D Body to produce consistent full-body mesh trajectories, while a padding-based parallel strategy enables efficient multi-human inference. Experimental results demonstrate that SAM-Body4D achieves improved temporal stability and robustness in challenging in-the-wild videos, without any retraining. Our code and demo are available at: https://github.com/gaomingqi/sam-body4d.