GE-Sim 2.0：邁向全面閉環視頻世界模擬器用於機器人操作的路線圖

摘要

我們介紹了 GE-Sim 2.0（Genie Envisioner World Simulator 2.0），這是一個專為機器人操作設計的閉環影片世界模擬器。基於 Genie Envisioner 的行動條件影片生成架構，GE-Sim 2.0 使用數千小時的真實世界機器人數據進行重新訓練，涵蓋遠端操作、接觸密集型互動以及機上策略部署，顯著提升了行動遵循精確度與軌跡覆蓋範圍。在此基礎上，三個新模組實現了從影片模擬到策略學習的閉環：一個狀態專家，可從影片潛在表示中解碼本體感受狀態，以支援下游 VLA 策略的下一區塊預測；一個世界評判員，根據任務指令對生成的展開序列進行評分，提供機器可驗證的成功訊號與獎勵，無需人工檢視；以及一個加速框架，可在單個 H100 上於 2.3 秒內生成 25 幀的展開序列，並在推理時實現高達 4 倍的跳幀，以支援長時程評估。GE-Sim 2.0 在公開的 WorldArena 排行榜上以僅 2B 參數位居榜首，超越了專用機器人世界模型與閉源通用影片生成器，且根據其展開序列與獎勵訓練的策略能在真實機器人上帶來可量化的效能提升，確立了 GE-Sim 2.0 作為可擴展評估與閉環學習操作策略的實用平台。

English

We introduce GE-Sim 2.0 (Genie Envisioner World Simulator 2.0), a closed-loop video world simulator for robotic manipulation. Building on the action-conditioned video generation framework of Genie Envisioner, GE-Sim 2.0 is re-trained on thousands of hours of real-world robot data spanning teleoperation, contact-rich interaction, and on-robot policy deployment, substantially improving action-following fidelity and trajectory coverage. On top of this foundation, three new modules close the loop from video simulation to policy learning: a state expert that decodes proprioceptive state from video latents to support next-chunk prediction by downstream VLA policies; a world judge that scores generated rollouts against task instructions, yielding machine-verifiable success signals and rewards in place of manual inspection; and an acceleration framework that delivers a 25-frame rollout in 2.3 seconds on a single H100, with up to 4* frame skipping at inference for long-horizon evaluation. GE-Sim 2.0 tops the public WorldArena leaderboard at only 2B parameters, outperforming both dedicated robotic world models and closed-source general video generators, and policies trained against its rollouts and rewards translate into measurable real-world gains, establishing GE-Sim 2.0 as a practical platform for scalable evaluation and closed-loop learning of manipulation policies.