ChatPaper.aiChatPaper

FRESA:基于少量图像的前馈式个性化蒙皮虚拟角色重建

FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

March 24, 2025
作者: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
cs.AI

摘要

我们提出了一种新颖的方法,仅需少量图像即可重建具有逼真动画效果的个性化3D人体化身。由于人体形态、姿势和衣物类型存在巨大差异,现有方法大多需要在推理过程中进行数小时的逐对象优化,这限制了其实际应用。相比之下,我们从上千名着装人体中学习通用先验知识,实现了即时前馈生成和零样本泛化。具体而言,我们不再为化身分配共享的蒙皮权重,而是联合推断个性化化身形状、蒙皮权重及姿势依赖的形变,从而有效提升了整体几何保真度并减少了形变伪影。此外,为了归一化姿势变化并解决规范形状与蒙皮权重之间的耦合模糊性,我们设计了一种3D规范化过程,以生成像素对齐的初始条件,这有助于重建精细的几何细节。随后,我们提出了一种多帧特征聚合方法,稳健地减少了规范化过程中引入的伪影,并融合出一个保留个人特征的合理化身。最后,我们在一个包含多样化人体对象与高质量3D扫描配对的大规模捕捉数据集上,以端到端框架训练模型。大量实验表明,我们的方法比现有技术生成了更为真实的重建与动画效果,并能直接推广至手机随手拍摄的输入。项目页面及代码可在https://github.com/rongakowang/FRESA获取。
English
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.

Summary

AI-Generated Summary

PDF42March 26, 2025