FRESA:基于少量图像的前馈式个性化蒙皮虚拟角色重建
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
March 24, 2025
作者: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
cs.AI
摘要
我们提出了一种新颖的方法,仅需少量图像即可重建具有逼真动画效果的个性化3D人体化身。由于人体形态、姿势和衣物类型存在巨大差异,现有方法大多需要在推理过程中进行数小时的逐对象优化,这限制了其实际应用。相比之下,我们从上千名着装人体中学习通用先验知识,实现了即时前馈生成和零样本泛化。具体而言,我们不再为化身分配共享的蒙皮权重,而是联合推断个性化化身形状、蒙皮权重及姿势依赖的形变,从而有效提升了整体几何保真度并减少了形变伪影。此外,为了归一化姿势变化并解决规范形状与蒙皮权重之间的耦合模糊性,我们设计了一种3D规范化过程,以生成像素对齐的初始条件,这有助于重建精细的几何细节。随后,我们提出了一种多帧特征聚合方法,稳健地减少了规范化过程中引入的伪影,并融合出一个保留个人特征的合理化身。最后,我们在一个包含多样化人体对象与高质量3D扫描配对的大规模捕捉数据集上,以端到端框架训练模型。大量实验表明,我们的方法比现有技术生成了更为真实的重建与动画效果,并能直接推广至手机随手拍摄的输入。项目页面及代码可在https://github.com/rongakowang/FRESA获取。
English
We present a novel method for reconstructing personalized 3D human avatars
with realistic animation from only a few images. Due to the large variations in
body shapes, poses, and cloth types, existing methods mostly require hours of
per-subject optimization during inference, which limits their practical
applications. In contrast, we learn a universal prior from over a thousand
clothed humans to achieve instant feedforward generation and zero-shot
generalization. Specifically, instead of rigging the avatar with shared
skinning weights, we jointly infer personalized avatar shape, skinning weights,
and pose-dependent deformations, which effectively improves overall geometric
fidelity and reduces deformation artifacts. Moreover, to normalize pose
variations and resolve coupled ambiguity between canonical shapes and skinning
weights, we design a 3D canonicalization process to produce pixel-aligned
initial conditions, which helps to reconstruct fine-grained geometric details.
We then propose a multi-frame feature aggregation to robustly reduce artifacts
introduced in canonicalization and fuse a plausible avatar preserving
person-specific identities. Finally, we train the model in an end-to-end
framework on a large-scale capture dataset, which contains diverse human
subjects paired with high-quality 3D scans. Extensive experiments show that our
method generates more authentic reconstruction and animation than
state-of-the-arts, and can be directly generalized to inputs from casually
taken phone photos. Project page and code is available at
https://github.com/rongakowang/FRESA.Summary
AI-Generated Summary