잠재 입자 세계 모델: 자기 지도 객체 중심 확률적 역학 모델링

초록

우리는 실제 세계의 다중 객체 데이터셋에 확장 가능하고 의사 결정에 적용할 수 있는 자기 지도 객체 중심 월드 모델인 잠재 입자 월드 모델(LPWM)을 소개합니다. LPWM은 비디오 데이터로부터 직접 키포인트, 바운딩 박스, 객체 마스크를 자율적으로 발견하여 지도 없이도 풍부한 장면 분해를 학습할 수 있습니다. 우리의 아키텍처는 순수하게 비디오만으로 end-to-end 훈련되며, 행동, 언어, 이미지 목표에 대한 유연한 조건 설정을 지원합니다. LPWM은 새로운 잠재 행동 모듈을 통해 확률적 입자 역학을 모델링하며, 다양한 실제 세계 및 합성 데이터셋에서 최첨단 성능을 달성합니다. 확률적 비디오 모델링을 넘어, LPWM은 본 논문에서 입증하는 바와 같이 목표 조건 설정 모방 학습을 포함한 의사 결정에 바로 적용 가능합니다. 코드, 데이터, 사전 훈련된 모델 및 비디오 롤아웃은 다음에서 확인할 수 있습니다: https://taldatech.github.io/lpwm-web

English

We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets and applicable in decision-making. LPWM autonomously discovers keypoints, bounding boxes, and object masks directly from video data, enabling it to learn rich scene decompositions without supervision. Our architecture is trained end-to-end purely from videos and supports flexible conditioning on actions, language, and image goals. LPWM models stochastic particle dynamics via a novel latent action module and achieves state-of-the-art results on diverse real-world and synthetic datasets. Beyond stochastic video modeling, LPWM is readily applicable to decision-making, including goal-conditioned imitation learning, as we demonstrate in the paper. Code, data, pre-trained models and video rollouts are available: https://taldatech.github.io/lpwm-web

잠재 입자 세계 모델: 자기 지도 객체 중심 확률적 역학 모델링

Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

초록

Support