ChatPaper.aiChatPaper

FRESA:基於少量圖像的前饋式個人化骨骼動畫角色重建

FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

March 24, 2025
作者: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
cs.AI

摘要

我們提出了一種新穎的方法,僅需少量圖像即可重建具有逼真動畫的個性化3D人體化身。由於體型、姿勢和衣物類型存在巨大差異,現有方法大多需要在推理時進行長達數小時的逐個主體優化,這限制了其實際應用。與此不同,我們從上千名著裝人體中學習了一種通用先驗,從而實現即時前饋生成和零樣本泛化。具體而言,我們並未使用共享的蒙皮權重來綁定化身,而是聯合推斷個性化的化身形狀、蒙皮權重及姿勢依賴的形變,這有效提升了整體幾何保真度並減少了形變偽影。此外,為規範化姿勢變化並解決標準形狀與蒙皮權重之間的耦合模糊性,我們設計了一種3D標準化過程,以生成像素對齊的初始條件,這有助於重建細粒度的幾何細節。隨後,我們提出了一種多幀特徵聚合方法,以穩健地減少標準化過程中引入的偽影,並融合出保留個人特徵的合理化身。最終,我們在一個大規模捕捉數據集上以端到端框架訓練模型,該數據集包含多樣化的人體主體及其高質量3D掃描配對。大量實驗表明,我們的方法比現有技術生成了更真實的重建與動畫,並且能直接泛化至隨意拍攝的手機照片輸入。項目頁面與代碼可在https://github.com/rongakowang/FRESA獲取。
English
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.

Summary

AI-Generated Summary

PDF42March 26, 2025