Gemini Robotics:将人工智能引入物理世界
Gemini Robotics: Bringing AI into the Physical World
March 25, 2025
作者: Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, Steven Bohez, Konstantinos Bousmalis, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Oscar Chang, Jose Enrique Chen, Xi Chen, Hao-Tien Lewis Chiang, Krzysztof Choromanski, David D'Ambrosio, Sudeep Dasari, Todor Davchev, Coline Devin, Norman Di Palo, Tianli Ding, Adil Dostmohamed, Danny Driess, Yilun Du, Debidatta Dwibedi, Michael Elabd, Claudio Fantacci, Cody Fong, Erik Frey, Chuyuan Fu, Marissa Giustina, Keerthana Gopalakrishnan, Laura Graesser, Leonard Hasenclever, Nicolas Heess, Brandon Hernaez, Alexander Herzog, R. Alex Hofer, Jan Humplik, Atil Iscen, Mithun George Jacob, Deepali Jain, Ryan Julian, Dmitry Kalashnikov, M. Emre Karagozler, Stefani Karp, Chase Kew, Jerad Kirkland, Sean Kirmani, Yuheng Kuang, Thomas Lampe, Antoine Laurens, Isabel Leal, Alex X. Lee, Tsang-Wei Edward Lee, Jacky Liang, Yixin Lin, Sharath Maddineni, Anirudha Majumdar, Assaf Hurwitz Michaely, Robert Moreno, Michael Neunert, Francesco Nori, Carolina Parada, Emilio Parisotto, Peter Pastor, Acorn Pooley, Kanishka Rao, Krista Reymann, Dorsa Sadigh, Stefano Saliceti, Pannag Sanketi, Pierre Sermanet, Dhruv Shah, Mohit Sharma, Kathryn Shea, Charles Shu, Vikas Sindhwani, Sumeet Singh, Radu Soricut, Jost Tobias Springenberg, Rachel Sterneck, Razvan Surdulescu, Jie Tan, Jonathan Tompson, Vincent Vanhoucke, Jake Varley, Grace Vesom, Giulia Vezzani, Oriol Vinyals, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Fei Xia, Ted Xiao, Annie Xie, Jinyu Xie, Peng Xu, Sichun Xu, Ying Xu, Zhuo Xu, Yuxiang Yang, Rui Yao, Sergey Yaroshenko, Wenhao Yu, Wentao Yuan, Jingwei Zhang, Tingnan Zhang, Allan Zhou, Yuxiang Zhou
cs.AI
摘要
近期,大型多模态模型的进展催生了数字领域中卓越的通用能力,然而这些能力向物理实体(如机器人)的转化仍面临重大挑战。本报告介绍了一类专为机器人技术设计、基于Gemini 2.0框架的新型AI模型家族。我们推出了Gemini Robotics,这是一种先进的视觉-语言-动作(VLA)通用模型,能够直接控制机器人。Gemini Robotics执行流畅且反应迅速的动作,以应对各种复杂的操作任务,同时对物体类型和位置的变化具有鲁棒性,能够处理未见过的环境,并遵循多样化的开放词汇指令。我们展示,通过额外的微调,Gemini Robotics可被特化以掌握新能力,包括解决长期、高度灵巧的任务,从仅100次演示中学习新的短期任务,以及适应完全新颖的机器人形态。这一成就得益于Gemini Robotics建立在Gemini Robotics-ER模型之上,后者是我们在此工作中引入的第二个模型。Gemini Robotics-ER(具身推理)将Gemini的多模态推理能力扩展至物理世界,增强了空间与时间的理解力。这赋予了机器人技术相关的能力,如物体检测、指向、轨迹与抓取预测,以及多视角对应与三维边界框预测。我们展示了这一新颖组合如何支持多种机器人应用。同时,我们也讨论并解决了与这类新型机器人基础模型相关的重要安全考量。Gemini Robotics家族标志着向开发通用机器人迈出了重要一步,实现了AI在物理世界中的潜力。
English
Recent advancements in large multimodal models have led to the emergence of
remarkable generalist capabilities in digital domains, yet their translation to
physical agents such as robots remains a significant challenge. This report
introduces a new family of AI models purposefully designed for robotics and
built upon the foundation of Gemini 2.0. We present Gemini Robotics, an
advanced Vision-Language-Action (VLA) generalist model capable of directly
controlling robots. Gemini Robotics executes smooth and reactive movements to
tackle a wide range of complex manipulation tasks while also being robust to
variations in object types and positions, handling unseen environments as well
as following diverse, open vocabulary instructions. We show that with
additional fine-tuning, Gemini Robotics can be specialized to new capabilities
including solving long-horizon, highly dexterous tasks, learning new
short-horizon tasks from as few as 100 demonstrations and adapting to
completely novel robot embodiments. This is made possible because Gemini
Robotics builds on top of the Gemini Robotics-ER model, the second model we
introduce in this work. Gemini Robotics-ER (Embodied Reasoning) extends
Gemini's multimodal reasoning capabilities into the physical world, with
enhanced spatial and temporal understanding. This enables capabilities relevant
to robotics including object detection, pointing, trajectory and grasp
prediction, as well as multi-view correspondence and 3D bounding box
predictions. We show how this novel combination can support a variety of
robotics applications. We also discuss and address important safety
considerations related to this new class of robotics foundation models. The
Gemini Robotics family marks a substantial step towards developing
general-purpose robots that realizes AI's potential in the physical world.Summary
AI-Generated Summary