LEGENT:具身代理人的開放平台
LEGENT: Open Platform for Embodied Agents
April 28, 2024
作者: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
儘管大型語言模型(LLMs)和大型多模型模型(LMMs)取得了進展,但它們與以語言為基礎、類似人類實體代理人的整合仍不完整,阻礙了在實際物理環境中進行複雜任務的表現。現有的整合通常存在著開源受限,阻礙了這一領域的集體進展。我們介紹了LEGENT,一個開放且可擴展的平台,用於開發使用LLMs和LMMs的實體代理人。LEGENT提供了雙重方法:一個豐富的互動式3D環境,具有可溝通和可操作的代理人,配合用戶友好的界面,以及一個複雜的數據生成管道,利用先進算法從模擬世界中大規模利用監督。在我們的實驗中,一個在LEGENT生成的數據上訓練的胚胎視覺-語言-行動模型超越了GPT-4V在實體任務中,展示了有希望的泛化能力。
English
Despite advancements in Large Language Models (LLMs) and Large Multimodal
Models (LMMs), their integration into language-grounded, human-like embodied
agents remains incomplete, hindering complex real-life task performance in
physical environments. Existing integrations often feature limited open
sourcing, challenging collective progress in this field. We introduce LEGENT,
an open, scalable platform for developing embodied agents using LLMs and LMMs.
LEGENT offers a dual approach: a rich, interactive 3D environment with
communicable and actionable agents, paired with a user-friendly interface, and
a sophisticated data generation pipeline utilizing advanced algorithms to
exploit supervision from simulated worlds at scale. In our experiments, an
embryonic vision-language-action model trained on LEGENT-generated data
surpasses GPT-4V in embodied tasks, showcasing promising generalization
capabilities.Summary
AI-Generated Summary