ChatPaper.aiChatPaper

DIMO:面向任意物体的多样化三维运动生成

DIMO: Diverse 3D Motion Generation for Arbitrary Objects

November 10, 2025
作者: Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis
cs.AI

摘要

我们提出DIMO,一种能够从单张图像生成任意物体多样化三维运动的生成式方法。该方法的核心思想是利用训练成熟的视频模型中蕴含的丰富先验知识,提取通用运动模式并将其嵌入共享的低维潜空间。具体而言,我们首先生成包含同一物体但具有不同运动的多个视频序列,随后将每种运动嵌入潜向量,并训练共享运动解码器来学习由结构化紧凑运动表征(即神经关键点轨迹)所描述的运动分布。接着,这些关键点会驱动规范三维高斯模型并进行融合,以构建几何与外观表征。在推理阶段,通过已学习的潜空间可实现单次前向传播即时采样多样三维运动,并支持三维运动插值与语言引导运动生成等多项创新应用。项目页面详见:https://linzhanm.github.io/dimo。
English
We present DIMO, a generative approach capable of generating diverse 3D motions for arbitrary objects from a single image. The core idea of our work is to leverage the rich priors in well-trained video models to extract the common motion patterns and then embed them into a shared low-dimensional latent space. Specifically, we first generate multiple videos of the same object with diverse motions. We then embed each motion into a latent vector and train a shared motion decoder to learn the distribution of motions represented by a structured and compact motion representation, i.e., neural key point trajectories. The canonical 3D Gaussians are then driven by these key points and fused to model the geometry and appearance. During inference time with learned latent space, we can instantly sample diverse 3D motions in a single-forward pass and support several interesting applications including 3D motion interpolation and language-guided motion generation. Our project page is available at https://linzhanm.github.io/dimo.
PDF42December 2, 2025