Genie Envisioner:面向机器人操作的一体化世界基础平台
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
August 7, 2025
作者: Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, Liliang Chen, Shuicheng Yan, Maoqing Yao, Guanghui Ren
cs.AI
摘要
我们推出Genie Envisioner(GE),这是一个面向机器人操作的统一世界基础平台,它将策略学习、评估和仿真集成于单一的视频生成框架内。其核心是GE-Base,一个大规模、指令条件化的视频扩散模型,它在一个结构化的潜在空间中捕捉真实世界机器人交互的空间、时间和语义动态。在此基础上,GE-Act通过一个轻量级的流匹配解码器,将潜在表征映射为可执行的动作轨迹,实现了在多种实体间进行精确且可泛化的策略推理,且只需极少的监督。为了支持可扩展的评估和训练,GE-Sim作为动作条件化的神经模拟器,为闭环策略开发提供高保真的模拟运行。该平台还配备了EWMBench,一个标准化基准套件,用于衡量视觉保真度、物理一致性及指令与动作的对齐程度。这些组件共同确立了Genie Envisioner作为指令驱动、通用型具身智能的可扩展且实用的基础。所有代码、模型和基准测试都将公开发布。
English
We introduce Genie Envisioner (GE), a unified world foundation platform for
robotic manipulation that integrates policy learning, evaluation, and
simulation within a single video-generative framework. At its core, GE-Base is
a large-scale, instruction-conditioned video diffusion model that captures the
spatial, temporal, and semantic dynamics of real-world robotic interactions in
a structured latent space. Built upon this foundation, GE-Act maps latent
representations to executable action trajectories through a lightweight,
flow-matching decoder, enabling precise and generalizable policy inference
across diverse embodiments with minimal supervision. To support scalable
evaluation and training, GE-Sim serves as an action-conditioned neural
simulator, producing high-fidelity rollouts for closed-loop policy development.
The platform is further equipped with EWMBench, a standardized benchmark suite
measuring visual fidelity, physical consistency, and instruction-action
alignment. Together, these components establish Genie Envisioner as a scalable
and practical foundation for instruction-driven, general-purpose embodied
intelligence. All code, models, and benchmarks will be released publicly.