FRESA：少数の画像からのパーソナライズドスキンアバターのフィードフォワード再構築

要旨

少数の画像のみから現実的なアニメーションを伴うパーソナライズされた3D人間アバターを再構築するための新しい手法を提案します。身体形状、ポーズ、衣服タイプの多様性のため、既存の手法の多くは推論時に被験者ごとに数時間の最適化を必要とし、実用的な応用が制限されています。これに対し、私たちは1000体以上の衣服を着た人間から普遍的な事前知識を学習し、即時のフィードフォワード生成とゼロショット汎化を実現します。具体的には、アバターに共有のスキニングウェイトを適用する代わりに、パーソナライズされたアバター形状、スキニングウェイト、ポーズ依存の変形を共同で推論し、これにより全体的な幾何学的忠実度が向上し、変形アーティファクトが減少します。さらに、ポーズの変動を正規化し、標準形状とスキニングウェイトの間の結合された曖昧さを解決するために、3D正規化プロセスを設計し、ピクセル単位で整合した初期条件を生成することで、細かい幾何学的詳細の再構築を支援します。その後、正規化で導入されたアーティファクトを堅牢に減少させ、個人固有のアイデンティティを保持した妥当なアバターを融合するためのマルチフレーム特徴集約を提案します。最後に、大規模なキャプチャデータセット上でエンドツーエンドのフレームワークでモデルを訓練します。このデータセットには、高品質な3Dスキャンとペアになった多様な人間被験者が含まれています。広範な実験により、私たちの手法が最先端技術よりも本物らしい再構築とアニメーションを生成し、携帯電話で撮影されたカジュアルな入力にも直接汎化できることが示されています。プロジェクトページとコードはhttps://github.com/rongakowang/FRESAで公開されています。

English

We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.

FRESA：少数の画像からのパーソナライズドスキンアバターのフィードフォワード再構築

FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

要旨

Support