GE-Sim 2.0: 面向机器人操控的综合性闭环视频世界模拟器路线图

摘要

我们提出GE-Sim 2.0（Genie Envisioner World Simulator 2.0），一种面向机器人操作的闭环视频世界模拟器。在Genie Envisioner的动作条件视频生成框架基础上，GE-Sim 2.0使用数千小时的真实机器人数据进行重新训练，涵盖遥操作、接触密集交互及机载策略部署，显著提升了动作跟随的保真度与轨迹覆盖范围。在此基础之上，三个新模块实现了从视频模拟到策略学习的闭环：状态专家模块，从视频潜在表示中解码本体感知状态，以支持下游VLA策略的下一片段预测；世界裁判模块，根据任务指令对生成的轨迹进行评分，提供可机器验证的成功信号与奖励，替代人工检查；加速框架，在单块H100上以2.3秒生成25帧，并在推理时支持最高4倍跳帧，用于长程评估。GE-Sim 2.0仅以2B参数量登顶公开的WorldArena排行榜，超越专用机器人世界模型与闭源通用视频生成器；基于其生成轨迹与奖励训练的策略能够转化为可度量的真实世界性能提升，确立了GE-Sim 2.0作为可扩展评估与操作策略闭环学习的实用平台的地位。

English

We introduce GE-Sim 2.0 (Genie Envisioner World Simulator 2.0), a closed-loop video world simulator for robotic manipulation. Building on the action-conditioned video generation framework of Genie Envisioner, GE-Sim 2.0 is re-trained on thousands of hours of real-world robot data spanning teleoperation, contact-rich interaction, and on-robot policy deployment, substantially improving action-following fidelity and trajectory coverage. On top of this foundation, three new modules close the loop from video simulation to policy learning: a state expert that decodes proprioceptive state from video latents to support next-chunk prediction by downstream VLA policies; a world judge that scores generated rollouts against task instructions, yielding machine-verifiable success signals and rewards in place of manual inspection; and an acceleration framework that delivers a 25-frame rollout in 2.3 seconds on a single H100, with up to 4* frame skipping at inference for long-horizon evaluation. GE-Sim 2.0 tops the public WorldArena leaderboard at only 2B parameters, outperforming both dedicated robotic world models and closed-source general video generators, and policies trained against its rollouts and rewards translate into measurable real-world gains, establishing GE-Sim 2.0 as a practical platform for scalable evaluation and closed-loop learning of manipulation policies.