Kinema4D：面向时空具身仿真的运动学四维世界建模

摘要

模拟机器人-世界交互是具身人工智能的基石。近期少数研究展现出利用视频生成技术突破传统模拟器刚性视觉/物理约束的潜力。然而，这些方法主要基于二维空间或静态环境线索，忽略了机器人-世界交互本质上是需要精确交互建模的四维时空事件。为恢复这种四维本质并确保精确的机器人控制，我们提出Kinema4D——一种新型动作条件化四维生成式机器人模拟器，其将机器人-世界交互解耦为：i）机器人控制的精确四维表征：通过运动学驱动基于URDF的三维机器人，生成精确的四维机器人控制轨迹；ii）环境反应的生成式四维建模：将四维机器人轨迹投影为点云图的时空视觉信号，控制生成模型将复杂环境的反应动力学合成为同步的RGB/点云序列。为促进训练，我们构建了大规模数据集Robo4D-200k，包含201,426个具有高质量四维标注的机器人交互片段。大量实验表明，我们的方法能有效模拟物理合理、几何一致且与具体载体无关的交互行为，精准反映多样化的真实世界动力学特性。该方法首次展现出零样本迁移的潜力，为推进下一代具身模拟技术奠定了高保真基础。

English

Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmental cues, ignoring the fundamental reality that robot-world interactions are inherently 4D spatiotemporal events that require precise interactive modeling. To restore this 4D essence while ensuring the precise robot control, we introduce Kinema4D, a new action-conditioned 4D generative robotic simulator that disentangles the robot-world interaction into: i) Precise 4D representation of robot controls: we drive a URDF-based 3D robot via kinematics, producing a precise 4D robot control trajectory. ii) Generative 4D modeling of environmental reactions: we project the 4D robot trajectory into a pointmap as a spatiotemporal visual signal, controlling the generative model to synthesize complex environments' reactive dynamics into synchronized RGB/pointmap sequences. To facilitate training, we curated a large-scale dataset called Robo4D-200k, comprising 201,426 robot interaction episodes with high-quality 4D annotations. Extensive experiments demonstrate that our method effectively simulates physically-plausible, geometry-consistent, and embodiment-agnostic interactions that faithfully mirror diverse real-world dynamics. For the first time, it shows potential zero-shot transfer capability, providing a high-fidelity foundation for advancing next-generation embodied simulation.

Kinema4D：面向时空具身仿真的运动学四维世界建模

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

摘要

Support