AnimaX:运用联合视频-姿态扩散模型为3D无生命物体赋予动画效果
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
June 24, 2025
作者: Zehuan Huang, Haoran Feng, Yangtian Sun, Yuanchen Guo, Yanpei Cao, Lu Sheng
cs.AI
摘要
我们推出AnimaX,一种前馈式3D动画框架,它巧妙地将视频扩散模型的运动先验与基于骨骼动画的可控结构相融合。传统运动合成方法要么局限于固定的骨骼拓扑,要么需要在高维变形空间中进行耗时的优化。相比之下,AnimaX高效地将基于视频的运动知识迁移至3D领域,支持任意骨骼的多样化关节网格。我们的方法将3D运动表示为多视角、多帧的2D姿态图,并实现了基于模板渲染和文本运动提示的联合视频-姿态扩散。我们引入了共享的位置编码和模态感知嵌入,确保视频与姿态序列间的时空对齐,从而有效将视频先验迁移至运动生成任务。生成的多视角姿态序列通过三角测量转化为3D关节位置,并借助逆向运动学转换为网格动画。AnimaX在全新构建的包含160,000条绑定序列的数据集上训练,在VBench上于泛化性、运动保真度及效率方面均达到了业界领先水平,为类别无关的3D动画提供了可扩展的解决方案。项目页面:https://anima-x.github.io/{https://anima-x.github.io/}。
English
We present AnimaX, a feed-forward 3D animation framework that bridges the
motion priors of video diffusion models with the controllable structure of
skeleton-based animation. Traditional motion synthesis methods are either
restricted to fixed skeletal topologies or require costly optimization in
high-dimensional deformation spaces. In contrast, AnimaX effectively transfers
video-based motion knowledge to the 3D domain, supporting diverse articulated
meshes with arbitrary skeletons. Our method represents 3D motion as multi-view,
multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned on
template renderings and a textual motion prompt. We introduce shared positional
encodings and modality-aware embeddings to ensure spatial-temporal alignment
between video and pose sequences, effectively transferring video priors to
motion generation task. The resulting multi-view pose sequences are
triangulated into 3D joint positions and converted into mesh animation via
inverse kinematics. Trained on a newly curated dataset of 160,000 rigged
sequences, AnimaX achieves state-of-the-art results on VBench in
generalization, motion fidelity, and efficiency, offering a scalable solution
for category-agnostic 3D animation. Project page:
https://anima-x.github.io/{https://anima-x.github.io/}.