ChatPaper.aiChatPaper

MoZoo:释放视频扩散在动物皮毛与肌肉模拟中的强大能力

MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation

April 8, 2026
作者: Dongxia Liu, Jie Ma, Xiaochen Yang, Jiancheng Zhang, Bin Xia, Zhehan Kan, Nisha Huang, Jun Liang, Wenming Yang, Jin Li
cs.AI

摘要

电影级动物特效的创作需要精准模拟肌肉与皮毛动力学,这一过程在传统制作流程中既耗费人力又计算成本高昂。尽管生成式扩散模型已在多种艺术工作流中展现出潜力,但其在高保真动物仿真方面的应用尚未被充分挖掘。我们提出MoZoo——一种生成式动力学求解器,该模型绕过了传统精化流程,能够在多模态引导下从粗糙网格直接合成高保真动物视频。我们设计了角色感知RoPE机制,通过基于角色的索引重映射实现运动对齐同步,同时利用固定时间偏移解耦参考信息。与此配合的非对称解耦注意力机制将隐序列分区,强制单向信息流动,有效防止特征干扰并提升计算效率。针对高质量训练数据稀缺问题,我们提出MoZoo-Data合成-真实管线,借助渲染引擎与逆映射方法构建大规模配对序列数据集。此外,我们建立了包含120组网格-视频对的综合性基准测试MoZooBench。实验结果表明,MoZoo能够跨不同动物骨架与布局实现高保真皮毛仿真,并在时间一致性与结构一致性上保持优越性能。
English
The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature interference and improving computational efficiency. To address the scarcity of high-quality training data, we introduce MoZoo-Data, a synthetic-to-real pipeline that leverages a rendering engine and an inverse mapping approach to construct a large-scale dataset of paired sequences. Furthermore, we establish MoZooBench, a comprehensive benchmark with 120 mesh-video pairs. Experimental results demonstrate that MoZoo achieves high-fidelity fur simulation across diverse animal skeletons and layouts, preserving superior temporal and structural consistency.