DPoser-X:扩散模型作为鲁棒的三维全身人体姿态先验
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
August 1, 2025
作者: Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Xian Liu, Zhongang Cai, Lei Yang, Yulun Zhang, Haoqian Wang, Ziwei Liu
cs.AI
摘要
我们提出了DPoser-X,一种基于扩散的3D全身人体姿态先验模型。构建一个多功能且稳健的全身人体姿态先验模型仍具挑战性,这源于人体关节姿态固有的复杂性以及高质量全身姿态数据集的稀缺。针对这些限制,我们引入了一种扩散模型作为姿态先验(DPoser),并将其扩展至DPoser-X,用于富有表现力的全身人体姿态建模。我们的方法将多种姿态中心任务统一为逆问题,通过变分扩散采样加以解决。为了提升下游应用的性能,我们提出了一种新颖的截断时间步调度方法,专门针对姿态数据特性设计。此外,我们还提出了一种掩码训练机制,有效整合了全身与部位特定数据集,使模型能够捕捉身体部位间的相互依赖关系,同时避免对特定动作的过拟合。大量实验表明,DPoser-X在身体、手部、面部及全身姿态建模的多个基准测试中展现出卓越的鲁棒性和多功能性。我们的模型持续超越现有最先进方案,为全身人体姿态先验建模树立了新标杆。
English
We present DPoser-X, a diffusion-based prior model for 3D whole-body human
poses. Building a versatile and robust full-body human pose prior remains
challenging due to the inherent complexity of articulated human poses and the
scarcity of high-quality whole-body pose datasets. To address these
limitations, we introduce a Diffusion model as body Pose prior (DPoser) and
extend it to DPoser-X for expressive whole-body human pose modeling. Our
approach unifies various pose-centric tasks as inverse problems, solving them
through variational diffusion sampling. To enhance performance on downstream
applications, we introduce a novel truncated timestep scheduling method
specifically designed for pose data characteristics. We also propose a masked
training mechanism that effectively combines whole-body and part-specific
datasets, enabling our model to capture interdependencies between body parts
while avoiding overfitting to specific actions. Extensive experiments
demonstrate DPoser-X's robustness and versatility across multiple benchmarks
for body, hand, face, and full-body pose modeling. Our model consistently
outperforms state-of-the-art alternatives, establishing a new benchmark for
whole-body human pose prior modeling.