RigMo:统一绑定与运动学习的生成式动画新框架
RigMo: Unifying Rig and Motion Learning for Generative Animation
January 10, 2026
作者: Hao Zhang, Jiahao Luo, Bohui Wan, Yizhou Zhao, Zongrui Li, Michael Vasilkovsky, Chaoyang Wang, Jian Wang, Narendra Ahuja, Bing Zhou
cs.AI
摘要
尽管四维生成、骨骼绑定与运动模拟已取得显著进展,动画的核心结构与动态组件通常仍被拆分为独立问题进行研究。现有流程依赖真实骨骼与蒙皮权重进行运动生成,并将自动骨骼绑定视为独立环节,这削弱了系统的可扩展性与可解释性。我们提出RigMo这一统一生成框架,能够直接从原始网格序列中联合学习骨骼绑定与运动数据,无需任何人工标注的绑定信息。RigMo将逐顶点变形编码至两个紧凑的潜空间:解析为显式高斯骨骼与蒙皮权重的绑定潜空间,以及生成时变SE(3)变换的运动潜空间。这些输出共同定义了具有显式结构和连贯运动轨迹的可动画网格,实现了可变形物体的前馈式骨骼绑定与运动推断。除统一发现绑定-运动关系外,我们还在RigMo潜空间中构建了Motion-DiT模型,证明这种结构感知潜空间能自然支持下游运动生成任务。在DeformingThings4D、Objaverse-XL和TrueBones数据集上的实验表明,RigMo能够学习平滑、可解释且物理合理的骨骼系统,同时在重建精度与类别级泛化能力上超越现有自动绑定与形变基线。RigMo为统一化、结构感知且可扩展的动态三维建模建立了新范式。
English
Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.