Puppeteer:为您的3D模型装配与动画制作
Puppeteer: Rig and Animate Your 3D Models
August 14, 2025
作者: Chaoyue Song, Xiu Li, Fan Yang, Zhongcong Xu, Jiacheng Wei, Fayao Liu, Jiashi Feng, Guosheng Lin, Jianfeng Zhang
cs.AI
摘要
现代交互应用日益需要动态的3D内容,然而将静态3D模型转化为动画资产仍是内容创作流程中的一大瓶颈。尽管生成式AI的最新进展已彻底改变了静态3D模型的创建方式,但绑定和动画制作仍严重依赖专家干预。我们提出了Puppeteer,一个全面的框架,旨在实现多样化3D对象的自动绑定与动画生成。该系统首先通过自回归Transformer预测合理的骨骼结构,该Transformer采用基于关节的标记化策略以实现紧凑表示,并结合带有随机扰动层次排序方法,增强了双向学习能力。随后,系统通过一个基于注意力的架构推断蒙皮权重,该架构融入了拓扑感知的关节注意力机制,明确编码了基于骨骼图距离的关节间关系。最后,我们以可微分的优化为基础,补充了这些绑定技术,构建了一个动画生成管道,该管道在计算效率上优于现有方法,同时能生成稳定、高保真的动画。跨多个基准的广泛评估表明,我们的方法在骨骼预测精度和蒙皮质量上均显著超越了当前最先进的技术。该系统能够稳健处理多样化的3D内容,从专业设计的游戏资产到AI生成的形状,均能生成时间上连贯的动画,有效消除了现有方法中常见的抖动问题。
English
Modern interactive applications increasingly demand dynamic 3D content, yet
the transformation of static 3D models into animated assets constitutes a
significant bottleneck in content creation pipelines. While recent advances in
generative AI have revolutionized static 3D model creation, rigging and
animation continue to depend heavily on expert intervention. We present
Puppeteer, a comprehensive framework that addresses both automatic rigging and
animation for diverse 3D objects. Our system first predicts plausible skeletal
structures via an auto-regressive transformer that introduces a joint-based
tokenization strategy for compact representation and a hierarchical ordering
methodology with stochastic perturbation that enhances bidirectional learning
capabilities. It then infers skinning weights via an attention-based
architecture incorporating topology-aware joint attention that explicitly
encodes inter-joint relationships based on skeletal graph distances. Finally,
we complement these rigging advances with a differentiable optimization-based
animation pipeline that generates stable, high-fidelity animations while being
computationally more efficient than existing approaches. Extensive evaluations
across multiple benchmarks demonstrate that our method significantly
outperforms state-of-the-art techniques in both skeletal prediction accuracy
and skinning quality. The system robustly processes diverse 3D content, ranging
from professionally designed game assets to AI-generated shapes, producing
temporally coherent animations that eliminate the jittering issues common in
existing methods.