WEAVER, 더 나은, 더 빠른, 더 긴: 로봇 조작을 위한 효과적인 세계 모델

초록

세계 모델(WM, 즉 학습된 시뮬레이터)이 로봇공학에 미칠 잠재적 영향은 정책 평가, 정책 개선, 테스트 시점 계획에 이르기까지 광범위하며, 실제 환경과의 상호작용은 제한적이다. 이러한 하위 기능들을 활용하기 위해서는 WM이 세 가지 요구 사항, 즉 (i) 충실도(현실과 상관관계가 있는 시뮬레이션 궤적 생성), (ii) 일관성(장기간에 걸쳐 일관된 시뮬레이션 궤적 생성), (iii) 효율성(빠른 시뮬레이션 궤적 생성)을 동시에 충족해야 한다. 본 논문에서는 WEAVER(World Estimation Across Views for Embodied Reasoning)를 제안한다. 이는 세 가지 요구 사항을 모두 동시에 달성하는 WM 아키텍처로, 로봇 조작 작업에서 최첨단 결과를 제공한다. WEAVER는 흐름 일치 손실(flow-matching loss)을 통해 미래 잠재 변수와 보상 값을 예측하도록 학습된 다중 시점 WM이다. 우리는 기존 세계 모델링 접근법을 어렵게 만들었던 장기 동적 조작 작업을 가능하게 하는 데 필요한 모델 아키텍처, 메모리, 예측 목표에 걸친 핵심 설계 결정을 추출하여 제시한다. WEAVER를 로봇 하드웨어에 적용하여 정책 평가(실제 성공률과의 상관계수 ρ=0.870), 정책 개선(π_{0.5} 로봇 기반 모델 대비 실제 성공률 38% 향상), 테스트 시점 계획(기존 WM 대비 5~10배 속도 향상과 함께 실제 성공률 14% 향상)에서의 효과를 입증한다. 또한 WEAVER는 분포 외 시나리오에서 평가했을 때 기존 WM보다 더 나은 성능을 보여준다. 코드, 모델 및 비디오는 https://arnavkj1995.github.io/WEAVER/ 에서 확인할 수 있다.

English

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: (i) fidelity (i.e., producing simulated trajectories that correlate with reality), (ii) consistency (i.e., producing simulated trajectories that are coherent over long horizons), and (iii) efficiency (i.e., producing simulated trajectories quickly). We propose WEAVER (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. WEAVER is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of long-horizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply WEAVER in robotic hardware, demonstrating its effectiveness at policy evaluation (ρ=0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of 38% on top of the π_{0.5} robot foundation model), and test-time planning (real-world success rate improvement of 14% with a 5-10times speedup over prior WMs). WEAVER also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/ .