FRESA:基於少量圖像的前饋式個人化骨骼動畫角色重建
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
March 24, 2025
作者: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
cs.AI
摘要
我們提出了一種新穎的方法,僅需少量圖像即可重建具有逼真動畫的個性化3D人體化身。由於體型、姿勢和衣物類型存在巨大差異,現有方法大多需要在推理時進行長達數小時的逐個主體優化,這限制了其實際應用。與此不同,我們從上千名著裝人體中學習了一種通用先驗,從而實現即時前饋生成和零樣本泛化。具體而言,我們並未使用共享的蒙皮權重來綁定化身,而是聯合推斷個性化的化身形狀、蒙皮權重及姿勢依賴的形變,這有效提升了整體幾何保真度並減少了形變偽影。此外,為規範化姿勢變化並解決標準形狀與蒙皮權重之間的耦合模糊性,我們設計了一種3D標準化過程,以生成像素對齊的初始條件,這有助於重建細粒度的幾何細節。隨後,我們提出了一種多幀特徵聚合方法,以穩健地減少標準化過程中引入的偽影,並融合出保留個人特徵的合理化身。最終,我們在一個大規模捕捉數據集上以端到端框架訓練模型,該數據集包含多樣化的人體主體及其高質量3D掃描配對。大量實驗表明,我們的方法比現有技術生成了更真實的重建與動畫,並且能直接泛化至隨意拍攝的手機照片輸入。項目頁面與代碼可在https://github.com/rongakowang/FRESA獲取。
English
We present a novel method for reconstructing personalized 3D human avatars
with realistic animation from only a few images. Due to the large variations in
body shapes, poses, and cloth types, existing methods mostly require hours of
per-subject optimization during inference, which limits their practical
applications. In contrast, we learn a universal prior from over a thousand
clothed humans to achieve instant feedforward generation and zero-shot
generalization. Specifically, instead of rigging the avatar with shared
skinning weights, we jointly infer personalized avatar shape, skinning weights,
and pose-dependent deformations, which effectively improves overall geometric
fidelity and reduces deformation artifacts. Moreover, to normalize pose
variations and resolve coupled ambiguity between canonical shapes and skinning
weights, we design a 3D canonicalization process to produce pixel-aligned
initial conditions, which helps to reconstruct fine-grained geometric details.
We then propose a multi-frame feature aggregation to robustly reduce artifacts
introduced in canonicalization and fuse a plausible avatar preserving
person-specific identities. Finally, we train the model in an end-to-end
framework on a large-scale capture dataset, which contains diverse human
subjects paired with high-quality 3D scans. Extensive experiments show that our
method generates more authentic reconstruction and animation than
state-of-the-arts, and can be directly generalized to inputs from casually
taken phone photos. Project page and code is available at
https://github.com/rongakowang/FRESA.Summary
AI-Generated Summary