AnimaX:運用聯合視頻-姿態擴散模型為無生命物體賦予三維動畫生命
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
June 24, 2025
作者: Zehuan Huang, Haoran Feng, Yangtian Sun, Yuanchen Guo, Yanpei Cao, Lu Sheng
cs.AI
摘要
我們提出AnimaX,這是一種前饋式三維動畫框架,它將視頻擴散模型的運動先驗與基於骨架的動畫可控結構相結合。傳統的運動合成方法要么受限於固定的骨架拓撲結構,要么需要在高維變形空間中進行成本高昂的優化。相比之下,AnimaX有效地將基於視頻的運動知識轉移到三維領域,支持具有任意骨架的多樣化關節網格。我們的方法將三維運動表示為多視角、多幀的二維姿態圖,並實現了基於模板渲染和文本運動提示的聯合視頻-姿態擴散。我們引入了共享的位置編碼和模態感知嵌入,以確保視頻與姿態序列之間的時空對齊,從而有效地將視頻先驗轉移到運動生成任務中。生成的多視角姿態序列通過三角測量轉化為三維關節位置,並通過逆向運動學轉換為網格動畫。在一個新整理的包含160,000個綁定序列的數據集上訓練後,AnimaX在VBench上實現了在泛化性、運動保真度和效率方面的最新成果,為類別無關的三維動畫提供了一個可擴展的解決方案。項目頁面:https://anima-x.github.io/{https://anima-x.github.io/}。
English
We present AnimaX, a feed-forward 3D animation framework that bridges the
motion priors of video diffusion models with the controllable structure of
skeleton-based animation. Traditional motion synthesis methods are either
restricted to fixed skeletal topologies or require costly optimization in
high-dimensional deformation spaces. In contrast, AnimaX effectively transfers
video-based motion knowledge to the 3D domain, supporting diverse articulated
meshes with arbitrary skeletons. Our method represents 3D motion as multi-view,
multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned on
template renderings and a textual motion prompt. We introduce shared positional
encodings and modality-aware embeddings to ensure spatial-temporal alignment
between video and pose sequences, effectively transferring video priors to
motion generation task. The resulting multi-view pose sequences are
triangulated into 3D joint positions and converted into mesh animation via
inverse kinematics. Trained on a newly curated dataset of 160,000 rigged
sequences, AnimaX achieves state-of-the-art results on VBench in
generalization, motion fidelity, and efficiency, offering a scalable solution
for category-agnostic 3D animation. Project page:
https://anima-x.github.io/{https://anima-x.github.io/}.