Ani3DHuman: Fotorealistische 3D-Mensch-Animation mit selbstgeführter stochastischer Abtastung

papers.abstract

Aktuelle Methoden zur 3D-Menschanimation erreichen kaum Photorealismus: kinematikbasierte Ansätze mangelt es an nicht-starrer Dynamik (z.B. Kleidungsdynamik), während Methoden, die Video-Diffusions-Priors nutzen, nicht-starre Bewegung synthetisieren können, aber unter Qualitätsartefakten und Identitätsverlust leiden. Um diese Einschränkungen zu überwinden, präsentieren wir Ani3DHuman, ein Framework, das kinematikbasierte Animation mit Video-Diffusions-Priors verbindet. Wir führen zunächst eine geschichtete Bewegungsrepräsentation ein, die starre Bewegung von residualer nicht-starrer Bewegung entkoppelt. Die starre Bewegung wird durch ein kinematisches Verfahren erzeugt, das dann ein grobes Rendering produziert, um das Video-Diffusionsmodell bei der Generierung von Videosequenzen anzuleiten, welche die residuale nicht-starre Bewegung wiederherstellen. Diese Wiederherstellungsaufgabe, basierend auf Diffusion Sampling, ist jedoch äußerst anspruchsvoll, da die initialen Renderings Out-of-Distribution sind, was standardmäßige deterministische ODE-Sampler scheitern lässt. Daher schlagen wir eine neuartige, selbstgeführte stochastische Sampling-Methode vor, die das Out-of-Distribution-Problem effektiv adressiert, indem sie stochastisches Sampling (für photorealistische Qualität) mit Selbstführung (für Identitätstreue) kombiniert. Diese wiederhergestellten Videos liefern hochwertige Supervision, die die Optimierung des residualen nicht-starren Bewegungsfelds ermöglicht. Umfangreiche Experimente zeigen, dass \MethodName photorealistische 3D-Menschanimation generieren kann und bestehende Methoden übertrifft. Code ist verfügbar unter https://github.com/qiisun/ani3dhuman.

English

Current 3D human animation methods struggle to achieve photorealism: kinematics-based approaches lack non-rigid dynamics (e.g., clothing dynamics), while methods that leverage video diffusion priors can synthesize non-rigid motion but suffer from quality artifacts and identity loss. To overcome these limitations, we present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors. We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion. Rigid motion is generated by a kinematic method, which then produces a coarse rendering to guide the video diffusion model in generating video sequences that restore the residual non-rigid motion. However, this restoration task, based on diffusion sampling, is highly challenging, as the initial renderings are out-of-distribution, causing standard deterministic ODE samplers to fail. Therefore, we propose a novel self-guided stochastic sampling method, which effectively addresses the out-of-distribution problem by combining stochastic sampling (for photorealistic quality) with self-guidance (for identity fidelity). These restored videos provide high-quality supervision, enabling the optimization of the residual non-rigid motion field. Extensive experiments demonstrate that \MethodName can generate photorealistic 3D human animation, outperforming existing methods. Code is available in https://github.com/qiisun/ani3dhuman.

Ani3DHuman: Fotorealistische 3D-Mensch-Animation mit selbstgeführter stochastischer Abtastung

Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

papers.abstract

Support