DPoser-X：扩散模型作为鲁棒的三维全身人体姿态先验

摘要

我们提出了DPoser-X，一种基于扩散的3D全身人体姿态先验模型。构建一个多功能且稳健的全身人体姿态先验模型仍具挑战性，这源于人体关节姿态固有的复杂性以及高质量全身姿态数据集的稀缺。针对这些限制，我们引入了一种扩散模型作为姿态先验（DPoser），并将其扩展至DPoser-X，用于富有表现力的全身人体姿态建模。我们的方法将多种姿态中心任务统一为逆问题，通过变分扩散采样加以解决。为了提升下游应用的性能，我们提出了一种新颖的截断时间步调度方法，专门针对姿态数据特性设计。此外，我们还提出了一种掩码训练机制，有效整合了全身与部位特定数据集，使模型能够捕捉身体部位间的相互依赖关系，同时避免对特定动作的过拟合。大量实验表明，DPoser-X在身体、手部、面部及全身姿态建模的多个基准测试中展现出卓越的鲁棒性和多功能性。我们的模型持续超越现有最先进方案，为全身人体姿态先验建模树立了新标杆。

English

We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses. Building a versatile and robust full-body human pose prior remains challenging due to the inherent complexity of articulated human poses and the scarcity of high-quality whole-body pose datasets. To address these limitations, we introduce a Diffusion model as body Pose prior (DPoser) and extend it to DPoser-X for expressive whole-body human pose modeling. Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling. To enhance performance on downstream applications, we introduce a novel truncated timestep scheduling method specifically designed for pose data characteristics. We also propose a masked training mechanism that effectively combines whole-body and part-specific datasets, enabling our model to capture interdependencies between body parts while avoiding overfitting to specific actions. Extensive experiments demonstrate DPoser-X's robustness and versatility across multiple benchmarks for body, hand, face, and full-body pose modeling. Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.

DPoser-X：扩散模型作为鲁棒的三维全身人体姿态先验

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

摘要

Support