ChatPaper.aiChatPaper

ActionMesh:基于时序三维扩散模型的动态三维网格生成

ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

January 22, 2026
作者: Remy Sabathier, David Novotny, Niloy J. Mitra, Tom Monnier
cs.AI

摘要

生成动态3D对象是众多应用的核心技术,然而当前最先进的研究成果往往因其配置限制、运行耗时或质量局限而难以投入实际应用。我们推出ActionMesh这一生成模型,能够以前馈方式直接生成可直接投入生产的"动态"3D网格。受早期视频模型启发,我们的核心创新在于改造现有3D扩散模型,引入时间轴维度,构建出名为"时序3D扩散"的框架。具体而言:首先调整3D扩散阶段,使其生成代表时序变化且相互独立的3D形状序列隐变量;其次设计时序3D自编码器,将独立形状序列转换为预定义参考形状的对应形变,从而构建动画。通过整合这两个组件,ActionMesh可从单目视频、文本描述甚至带动画提示词的静态3D网格等不同输入生成动态3D网格。相较于现有方法,我们的方案具有速度快、无需骨骼绑定且保持拓扑一致等优势,支持快速迭代并实现纹理映射和重定向等无缝应用。在标准视频转4D基准测试(Consistent4D、Objaverse)中,我们的模型在几何精度与时序一致性方面均达到最先进水平,证明其能够以前所未有的速度和质量生成动态3D网格。
English
Generating animated 3D objects is at the heart of many applications, yet most advanced works are typically difficult to apply in practice because of their limited setup, their long runtime, or their limited quality. We introduce ActionMesh, a generative model that predicts production-ready 3D meshes "in action" in a feed-forward manner. Drawing inspiration from early video models, our key insight is to modify existing 3D diffusion models to include a temporal axis, resulting in a framework we dubbed "temporal 3D diffusion". Specifically, we first adapt the 3D diffusion stage to generate a sequence of synchronized latents representing time-varying and independent 3D shapes. Second, we design a temporal 3D autoencoder that translates a sequence of independent shapes into the corresponding deformations of a pre-defined reference shape, allowing us to build an animation. Combining these two components, ActionMesh generates animated 3D meshes from different inputs like a monocular video, a text description, or even a 3D mesh with a text prompt describing its animation. Besides, compared to previous approaches, our method is fast and produces results that are rig-free and topology consistent, hence enabling rapid iteration and seamless applications like texturing and retargeting. We evaluate our model on standard video-to-4D benchmarks (Consistent4D, Objaverse) and report state-of-the-art performances on both geometric accuracy and temporal consistency, demonstrating that our model can deliver animated 3D meshes with unprecedented speed and quality.
PDF52January 24, 2026