ChatPaper.aiChatPaper

NeuROK:生成式4D神經物體運動學

NeuROK: Generative 4D Neural Object Kinematics

May 28, 2026
作者: Chen Geng, Guangzhao He, Yue Gao, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
cs.AI

摘要

數據驅動的方法徹底改變了3D視覺領域,使Transformer能夠有效重建與生成靜態3D物體。然而,生成模擬性的4D動態——即靜態物體在各種物理條件下隨時間變形的真實過程——仍然充滿挑戰,且往往採用臨時性方法,儘管這對於建立全面的3D世界模型至關重要。現有方法大多假設預先定義的物理模型,並通過系統識別來估計參數,從而將這些方法限制在特定類別與小規模數據集中。我們提出,通過學習以物體為中心的物理系統的數據驅動運動學狀態參數化,可以克服這些限制。具體而言,我們同時學習一個代表物體所有可能狀態的潛在空間,以及一個將任一採樣潛在向量映射到物體合理變形形狀的解碼器。我們將此參數化稱為神經物體運動學(NeuROK),並在精心策劃的大規模4D數據集上訓練基於Transformer的編碼器-解碼器模型。此公式化與所學模型顯著簡化了模擬動態的生成,因為我們只需從經典物理中拉格朗日力學的角度,考慮低維潛在空間中的動力學。我們展示了該神經模擬框架在不同動態物體類型上的有效性與通用性,明顯優於先前的工作。專案頁面:https://chen-geng.com/neurok
English
Data-driven approaches have revolutionized 3D vision, enabling transformers to effectively reconstruct and generate static 3D objects. However, generating simulative 4D dynamics -- realistic temporal deformations of static objects under various physical conditions -- remains challenging and often ad hoc, despite its importance in building comprehensive 3D world models. Most existing methods assume a predefined physical model and use system identification to estimate parameters, restricting these methods to specific categories and small-scale datasets. We propose that these restrictions can be overcome by learning a data-driven kinematic state parameterization for object-centric physical systems. Specifically, we learn both a latent space representing all possible states of the object and a decoder that maps any sampled latent to a plausibly deformed shape of the object. We refer to this parameterization as Neural Object Kinematics (NeuROK), and learn a transformer-based encoder-decoder model on a curated large-scale 4D dataset. This formulation and the learned model significantly simplify the generation of simulative dynamics since we only need to consider the dynamics within a low-dimensional latent space from the Lagrangian mechanics' perspective in classical physics. We demonstrate the effectiveness and generality of this neural simulation framework across diverse dynamic object types, showing clear advantages over prior works. Project page: https://chen-geng.com/neurok