NeuROK:生成式四维神经物体运动学
NeuROK: Generative 4D Neural Object Kinematics
May 28, 2026
作者: Chen Geng, Guangzhao He, Yue Gao, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
cs.AI
摘要
数据驱动的方法彻底改变了3D视觉领域,使得Transformer能够高效地重建和生成静态3D物体。然而,生成模拟性4D动态——即静态物体在各种物理条件下随时间发生的现实变形——尽管对于构建全面的3D世界模型至关重要,但依然充满挑战,且往往依赖特设方法。现有大多数方法假设一个预定义的物理模型,并通过系统辨识来估计参数,这限制了这些方法仅能处理特定类别和小规模数据集。我们提出,通过学习面向物体中心物理系统的数据驱动运动学状态参数化,可以克服这些限制。具体来说,我们同时学习一个表示物体所有可能状态的潜空间,以及一个解码器,该解码器能够将任意采样的潜变量映射为物体一个合理的变形形状。我们将这种参数化称为神经物体运动学(NeuROK),并在精心策划的大规模4D数据集上训练基于Transformer的编码器-解码器模型。这一表述方式及学习到的模型极大地简化了模拟性动态的生成,因为我们只需从经典物理学中拉格朗日力学的角度,考虑低维潜空间内的动力学。我们通过多种动态物体类型展示了这一神经模拟框架的有效性和泛化能力,明显优于先前的工作。项目主页:https://chen-geng.com/neurok
English
Data-driven approaches have revolutionized 3D vision, enabling transformers to effectively reconstruct and generate static 3D objects. However, generating simulative 4D dynamics -- realistic temporal deformations of static objects under various physical conditions -- remains challenging and often ad hoc, despite its importance in building comprehensive 3D world models. Most existing methods assume a predefined physical model and use system identification to estimate parameters, restricting these methods to specific categories and small-scale datasets. We propose that these restrictions can be overcome by learning a data-driven kinematic state parameterization for object-centric physical systems. Specifically, we learn both a latent space representing all possible states of the object and a decoder that maps any sampled latent to a plausibly deformed shape of the object. We refer to this parameterization as Neural Object Kinematics (NeuROK), and learn a transformer-based encoder-decoder model on a curated large-scale 4D dataset. This formulation and the learned model significantly simplify the generation of simulative dynamics since we only need to consider the dynamics within a low-dimensional latent space from the Lagrangian mechanics' perspective in classical physics. We demonstrate the effectiveness and generality of this neural simulation framework across diverse dynamic object types, showing clear advantages over prior works. Project page: https://chen-geng.com/neurok