ChatPaper.aiChatPaper

精靈構想者:機器人操控的統一世界基礎平台

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

August 7, 2025
作者: Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, Liliang Chen, Shuicheng Yan, Maoqing Yao, Guanghui Ren
cs.AI

摘要

我們介紹Genie Envisioner(GE),這是一個統一的世界基礎平台,專為機器人操作設計,將策略學習、評估和模擬整合於單一的視頻生成框架內。其核心是GE-Base,這是一個大規模、指令條件化的視頻擴散模型,能夠在結構化的潛在空間中捕捉真實世界機器人互動的空間、時間和語義動態。基於此基礎,GE-Act通過一個輕量級的流匹配解碼器,將潛在表示映射為可執行的動作軌跡,從而實現跨多種具身形式的精確且可泛化的策略推斷,並只需極少的監督。為了支持可擴展的評估和訓練,GE-Sim作為一個動作條件化的神經模擬器,生成高保真度的模擬結果,用於閉環策略開發。該平台還配備了EWMBench,這是一個標準化的基準測試套件,用於衡量視覺保真度、物理一致性和指令-動作對齊度。這些組件共同構成了Genie Envisioner,作為一個可擴展且實用的基礎,用於指令驅動的通用具身智能。所有代碼、模型和基準測試將公開釋出。
English
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.
PDF672August 8, 2025