潜在粒子世界模型：基于自监督的对象中心随机动力学建模

摘要

我们提出隐式粒子世界模型（LPWM），这是一种可扩展至真实世界多物体数据集并适用于决策任务的自监督物体中心世界模型。LPWM能够直接从视频数据中自主发现关键点、边界框和物体掩码，从而无需监督即可学习丰富的场景分解。我们的架构完全基于视频进行端到端训练，支持对动作、语言和图像目标的灵活条件控制。通过新颖的隐式动作模块，LPWM实现了随机粒子动力学建模，并在多样化的真实世界和合成数据集上取得了最先进的结果。除了随机视频建模，LPWM还可直接应用于决策任务（包括目标条件模仿学习），正如我们在论文中所展示的。代码、数据、预训练模型和视频推演结果已开源：https://taldatech.github.io/lpwm-web

English

We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets and applicable in decision-making. LPWM autonomously discovers keypoints, bounding boxes, and object masks directly from video data, enabling it to learn rich scene decompositions without supervision. Our architecture is trained end-to-end purely from videos and supports flexible conditioning on actions, language, and image goals. LPWM models stochastic particle dynamics via a novel latent action module and achieves state-of-the-art results on diverse real-world and synthetic datasets. Beyond stochastic video modeling, LPWM is readily applicable to decision-making, including goal-conditioned imitation learning, as we demonstrate in the paper. Code, data, pre-trained models and video rollouts are available: https://taldatech.github.io/lpwm-web

潜在粒子世界模型：基于自监督的对象中心随机动力学建模

Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

摘要

Support