목표 추진력: 물리적 조건이 부여된 목표를 달성하도록 비디오 모델 가르치기

초록

최근 비디오 생성 기술의 발전으로 로봇 및 계획 분야에서 잠재적 미래를 시뮬레이션할 수 있는 '월드 모델' 개발이 가능해졌습니다. 그러나 이러한 모델에 대한 정확한 목표 설정은 여전히 과제로 남아 있습니다. 텍스트 지시는 물리적 뉘앙스를 포착하기에는 너무 추상적인 반면, 대상 이미지는 동적 작업에 대해 지정하기가 종종 불가능합니다. 이를 해결하기 위해 우리는 인간이 물리적 작업을 개념화하는 방식과 유사하게 명시적 힘 벡터와 중간 역학을 통해 사용자가 목표를 정의할 수 있는 새로운 프레임워크인 Goal Force를 소개합니다. 우리는 탄성 충돌 및 도미노 넘어짐과 같은 합성 인과 관계 기본 요소로 구성된 데이터셋으로 비디오 생성 모델을 훈련시켜 힘이 시간과 공간을 통해 전파되도록 가르칩니다. 단순한 물리 데이터로 훈련되었음에도 불구하고, 우리 모델은 도구 조작 및 다중 객체 인과 관계 체인을 포함한 복잡한 실제 시나리오에 대해 놀라운 제로샷 일반화 능력을 보여줍니다. 우리의 결과는 비디오 생성을 기본 물리적 상호작용에 기반함으로써 모델이 외부 엔진에 의존하지 않고 정확하고 물리 인식 계획을 가능하게 하는 암묵적 신경 물리 시뮬레이터로 발전할 수 있음을 시사합니다. 우리는 프로젝트 페이지에서 모든 데이터셋, 코드, 모델 가중치 및 대화형 비디오 데모를 공개합니다.

English

Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.

목표 추진력: 물리적 조건이 부여된 목표를 달성하도록 비디오 모델 가르치기

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

초록

Support