SIMA 2: 仮想世界のための汎用具体化エージェント
SIMA 2: A Generalist Embodied Agent for Virtual Worlds
December 4, 2025
著者: SIMA team, Adrian Bolton, Alexander Lerchner, Alexandra Cordell, Alexandre Moufarek, Andrew Bolt, Andrew Lampinen, Anna Mitenkova, Arne Olav Hallingstad, Bojan Vujatovic, Bonnie Li, Cong Lu, Daan Wierstra, Daniel P. Sawyer, Daniel Slater, David Reichert, Davide Vercelli, Demis Hassabis, Drew A. Hudson, Duncan Williams, Ed Hirst, Fabio Pardo, Felix Hill, Frederic Besse, Hannah Openshaw, Harris Chan, Hubert Soyer, Jane X. Wang, Jeff Clune, John Agapiou, John Reid, Joseph Marino, Junkyung Kim, Karol Gregor, Kaustubh Sridhar, Kay McKinney, Laura Kampis, Lei M. Zhang, Loic Matthey, Luyu Wang, Maria Abi Raad, Maria Loks-Thompson, Martin Engelcke, Matija Kecman, Matthew Jackson, Maxime Gazeau, Ollie Purkiss, Oscar Knagg, Peter Stys, Piermaria Mendolicchio, Raia Hadsell, Rosemary Ke, Ryan Faulkner, Sarah Chakera, Satinder Singh Baveja, Shane Legg, Sheleem Kashem, Tayfun Terzi, Thomas Keck, Tim Harley, Tim Scholtes, Tyson Roberts, Volodymyr Mnih, Yulan Liu, Zhengdong Wang, Zoubin Ghahramani
cs.AI
要旨
我々は、多様な3D仮想世界を理解し行動する汎用具現化エージェント「SIMA 2」を提案する。Gemini基盤モデル上に構築されたSIMA 2は、具現化環境内での能動的・目標指向的な相互作用に向けた重要な一歩を表す。単純な言語コマンドに限定されていた従来研究(例:SIMA 1)とは異なり、SIMA 2は対話型パートナーとして機能し、高次元目標の推論、ユーザーとの会話、言語および画像による複雑な指示の処理が可能である。多様なゲーム群において、SIMA 2は人間のパフォーマンスとの差を大幅に縮め、基盤モデルの中核的推論能力を維持しつつ、未経験環境への強力な一般化能力を示す。さらに、我々は開かれた自己改善能力を実証する:Geminiを利用してタスクを生成し報酬を提供することで、SIMA 2は新規環境においてゼロから自律的に新たなスキルを学習できる。本研究成果は、仮想世界ひいては将来的には物理世界に向けた、汎用的で継続的に学習するエージェント創出への道筋を検証するものである。
English
We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.