dWorldEval: 이산 확산 세계 모델 기반 확장성 있는 로봇 정책 평가

초록

기존 접근법으로는 수천 가지 환경과 수천 가지 작업에 걸친 로봇 정책 평가를 수행하는 것이 불가능합니다. 이는 확장 가능한 로봇 정책 평가를 위한 새로운 방법론의 필요성을 촉발합니다. 본 논문에서는 이산 확산 세계 모델을 로봇 정책의 확장 가능한 평가 프록시로 활용하는 dWorldEval을 제안합니다. 구체적으로 dWorldEval은 비전, 언어, 로봇 행동을 포함한 모든 모달리티를 통합 토큰 공간에 매핑하고, 이를 단일 트랜스포머 기반 디노이징 네트워크로 모델링합니다. 이 아키텍처를 기반으로 희소 키프레임 메모리를 활용하여 시공간적 일관성을 유지합니다. 또한 작업 완료 정도를 나타내는 진행률 토큰을 도입합니다. 추론 시 모델은 미래 관측값과 진행률 토큰을 공동으로 예측하여, 진행률이 1에 도달하면 성공을 자동으로 판단할 수 있게 합니다. 폭넓은 실험을 통해 dWorldEval이 LIBERO, RoboTwin 및 여러 실제 로봇 작업에서 기존 접근법(WorldEval, Ctrl-World, WorldGym)을 크게 능가함을 입증합니다. 이는 대규모 로봇 평가를 위한 세계 시뮬레이터 구축에 새로운 아키텍처 패러다임을 제시합니다.

English

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation proxy for robotics policies. Specifically, dWorldEval maps all modalities - including vision, language, and robotic actions - into a unified token space, modeling them via a single transformer-based denoising network. In this paper, we propose dWorldEval, using a discrete diffusion world model as a scalable evaluation proxy for robotics policy. Specifically, it maps all modalities, including vision, language, and robotics action into a unified token space, then denoises them with a single transformer network. Building on this architecture, we employ a sparse keyframe memory to maintain spatiotemporal consistency. We also introduce a progress token that indicates the degree of task completion. At inference, the model jointly predicts future observations and progress token, allowing automatically determine success when the progress reaches 1. Extensive experiments demonstrate that dWorldEval significantly outperforms previous approaches, i.e., WorldEval, Ctrl-World, and WorldGym, on LIBERO, RoboTwin, and multiple real-robot tasks. It paves the way for a new architectural paradigm in building world simulators for robotics evaluation at scale.

dWorldEval: 이산 확산 세계 모델 기반 확장성 있는 로봇 정책 평가

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

초록

Support