Puppeteer:為您的3D模型進行綁定與動畫製作
Puppeteer: Rig and Animate Your 3D Models
August 14, 2025
作者: Chaoyue Song, Xiu Li, Fan Yang, Zhongcong Xu, Jiacheng Wei, Fayao Liu, Jiashi Feng, Guosheng Lin, Jianfeng Zhang
cs.AI
摘要
現代互動應用程式日益需要動態的3D內容,然而將靜態3D模型轉化為動畫資產的過程,已成為內容創作流程中的一大瓶頸。儘管生成式AI的最新進展已徹底改變了靜態3D模型的創建方式,但骨骼綁定與動畫製作仍高度依賴專家介入。我們提出了Puppeteer,這是一個全面的框架,旨在解決多樣化3D物件的自動骨骼綁定與動畫生成問題。我們的系統首先通過一個自迴歸變換器預測合理的骨骼結構,該變換器引入了基於關節的標記化策略以實現緊湊表示,並採用帶有隨機擾動的層次排序方法,增強了雙向學習能力。接著,系統通過一個基於注意力的架構推斷蒙皮權重,該架構結合了拓撲感知的關節注意力,明確地根據骨骼圖距離編碼了關節間的關係。最後,我們以一種基於可微分優化的動畫生成管道來補充這些骨骼綁定技術,該管道在生成穩定、高保真動畫的同時,計算效率也優於現有方法。在多個基準測試上的廣泛評估表明,我們的方法在骨骼預測精度和蒙皮質量方面均顯著超越了當前最先進的技術。該系統能夠穩健處理從專業設計的遊戲資產到AI生成形狀的多樣化3D內容,產生的時間連貫動畫消除了現有方法中常見的抖動問題。
English
Modern interactive applications increasingly demand dynamic 3D content, yet
the transformation of static 3D models into animated assets constitutes a
significant bottleneck in content creation pipelines. While recent advances in
generative AI have revolutionized static 3D model creation, rigging and
animation continue to depend heavily on expert intervention. We present
Puppeteer, a comprehensive framework that addresses both automatic rigging and
animation for diverse 3D objects. Our system first predicts plausible skeletal
structures via an auto-regressive transformer that introduces a joint-based
tokenization strategy for compact representation and a hierarchical ordering
methodology with stochastic perturbation that enhances bidirectional learning
capabilities. It then infers skinning weights via an attention-based
architecture incorporating topology-aware joint attention that explicitly
encodes inter-joint relationships based on skeletal graph distances. Finally,
we complement these rigging advances with a differentiable optimization-based
animation pipeline that generates stable, high-fidelity animations while being
computationally more efficient than existing approaches. Extensive evaluations
across multiple benchmarks demonstrate that our method significantly
outperforms state-of-the-art techniques in both skeletal prediction accuracy
and skinning quality. The system robustly processes diverse 3D content, ranging
from professionally designed game assets to AI-generated shapes, producing
temporally coherent animations that eliminate the jittering issues common in
existing methods.