LEGENT:面向具身代理的开放平台
LEGENT: Open Platform for Embodied Agents
April 28, 2024
作者: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
尽管大型语言模型(LLMs)和大型多模态模型(LMMs)取得了进展,但它们与基于语言、类人化实体代理的整合仍然不完整,阻碍了在物理环境中执行复杂现实任务。现有的整合通常存在开源受限,阻碍了该领域的集体进展。我们引入了LEGENT,这是一个开放且可扩展的平台,用于利用LLMs和LMMs开发实体代理。LEGENT提供了双重方法:一个丰富的、互动的3D环境,具有可交流和可操作的代理,配合用户友好的界面,以及一个利用先进算法从模拟世界中规模化利用监督的复杂数据生成流水线。在我们的实验中,一个在LEGENT生成数据上训练的视觉-语言-动作模型超越了GPT-4V在实体任务中,展示了有前途的泛化能力。
English
Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities.Summary
AI-Generated Summary