Ani3DHuman : Animation photoréaliste d'humains en 3D par échantillonnage stochastique autoguidé

Résumé

Les méthodes actuelles d'animation humaine 3D peinent à atteindre le photoréalisme : les approches basées sur la cinématique manquent de dynamique non rigide (par exemple, la dynamique des vêtements), tandis que les méthodes exploitant des pré-entraînements par diffusion vidéo peuvent synthétiser des mouvements non rigides mais souffrent d'artefacts de qualité et de perte d'identité. Pour surmonter ces limitations, nous présentons Ani3DHuman, un cadre qui associe l'animation basée sur la cinématique à des pré-entraînements par diffusion vidéo. Nous introduisons d'abord une représentation motionnelle en couches qui sépare le mouvement rigide du mouvement non rigide résiduel. Le mouvement rigide est généré par une méthode cinématique, qui produit ensuite un rendu grossier pour guider le modèle de diffusion vidéo dans la génération de séquences vidéo restaurant le mouvement non rigide résiduel. Cependant, cette tâche de restauration, basée sur l'échantillonnage par diffusion, est très difficile car les rendus initiaux sont hors distribution, ce qui fait échouer les échantillonneurs ODE déterministes standards. Par conséquent, nous proposons une nouvelle méthode d'échantillonnage stochastique auto-guidée, qui résout efficacement le problème du hors distribution en combinant l'échantillonnage stochastique (pour la qualité photoréaliste) avec l'auto-guidage (pour la fidélité de l'identité). Ces vidéos restaurées fournissent un supervision de haute qualité, permettant l'optimisation du champ de mouvement non rigide résiduel. Des expériences approfondies démontrent qu'Ani3DHuman peut générer une animation humaine 3D photoréaliste, surpassant les méthodes existantes. Le code est disponible sur https://github.com/qiisun/ani3dhuman.

English

Current 3D human animation methods struggle to achieve photorealism: kinematics-based approaches lack non-rigid dynamics (e.g., clothing dynamics), while methods that leverage video diffusion priors can synthesize non-rigid motion but suffer from quality artifacts and identity loss. To overcome these limitations, we present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors. We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion. Rigid motion is generated by a kinematic method, which then produces a coarse rendering to guide the video diffusion model in generating video sequences that restore the residual non-rigid motion. However, this restoration task, based on diffusion sampling, is highly challenging, as the initial renderings are out-of-distribution, causing standard deterministic ODE samplers to fail. Therefore, we propose a novel self-guided stochastic sampling method, which effectively addresses the out-of-distribution problem by combining stochastic sampling (for photorealistic quality) with self-guidance (for identity fidelity). These restored videos provide high-quality supervision, enabling the optimization of the residual non-rigid motion field. Extensive experiments demonstrate that \MethodName can generate photorealistic 3D human animation, outperforming existing methods. Code is available in https://github.com/qiisun/ani3dhuman.

Ani3DHuman : Animation photoréaliste d'humains en 3D par échantillonnage stochastique autoguidé

Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

Résumé

Support