DPoser-X: ロバストな3D全身人体姿勢事前分布としての拡散モデル

要旨

本論文では、3D全身人体ポーズのための拡散モデルに基づく事前モデルであるDPoser-Xを提案する。関節構造を持つ人体ポーズの本質的な複雑さと、高品質な全身ポーズデータセットの不足により、汎用的でロバストな全身人体ポーズ事前モデルの構築は依然として困難な課題である。これらの制約に対処するため、我々は拡散モデルを身体ポーズ事前モデル（DPoser）として導入し、それを表現力豊かな全身人体ポーズモデリングのためのDPoser-Xに拡張した。本手法は、様々なポーズ中心のタスクを逆問題として統一し、変分拡散サンプリングを通じてそれらを解決する。下流アプリケーションにおける性能を向上させるため、ポーズデータの特性に特化した新しい切り捨てタイムステップスケジューリング手法を提案する。また、全身データセットと部位特化データセットを効果的に組み合わせるマスク付きトレーニングメカニズムを提案し、特定の動作への過剰適合を避けつつ身体部位間の相互依存関係を捉えることを可能にした。広範な実験により、DPoser-Xが身体、手、顔、および全身ポーズモデリングの複数のベンチマークにおいてロバスト性と汎用性を発揮することが示された。本モデルは、最先端の代替手法を一貫して上回り、全身人体ポーズ事前モデリングの新たなベンチマークを確立した。

English

We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses. Building a versatile and robust full-body human pose prior remains challenging due to the inherent complexity of articulated human poses and the scarcity of high-quality whole-body pose datasets. To address these limitations, we introduce a Diffusion model as body Pose prior (DPoser) and extend it to DPoser-X for expressive whole-body human pose modeling. Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling. To enhance performance on downstream applications, we introduce a novel truncated timestep scheduling method specifically designed for pose data characteristics. We also propose a masked training mechanism that effectively combines whole-body and part-specific datasets, enabling our model to capture interdependencies between body parts while avoiding overfitting to specific actions. Extensive experiments demonstrate DPoser-X's robustness and versatility across multiple benchmarks for body, hand, face, and full-body pose modeling. Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.

DPoser-X: ロバストな3D全身人体姿勢事前分布としての拡散モデル

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

要旨

Support