MindAgent:新兴游戏互动
MindAgent: Emergent Gaming Interaction
September 18, 2023
作者: Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao
cs.AI
摘要
大型语言模型(LLMs)具有在多智能体系统中执行复杂调度的能力,并可以协调这些智能体完成需要广泛协作的复杂任务。然而,尽管引入了许多游戏框架,但社区对于构建涵盖LLM和人类-NPC协作的通用多智能体协作基础设施仍然缺乏充分的基准。在这项工作中,我们提出了一种新型基础设施 - MindAgent - 用于评估游戏互动的规划和协调新能力。特别是,我们的基础设施利用现有的游戏框架,i)需要多智能体系统协调员的理解,ii)通过未调优的适当指令与人类玩家合作,iii)在少样本提示和反馈上建立上下文学习。此外,我们引入了一个新的游戏场景CUISINEWORLD和相关基准,用于评估多智能体协作效率,并监督多个代理同时玩游戏。我们使用新的自动度量CoS进行全面评估以计算协作效率。最后,我们的基础设施可以部署到CUISINEWORLD的定制VR版本中的真实游戏场景,并适用于现有更广泛的Minecraft游戏领域。我们希望我们关于LLMs和通用调度与协调的新基础设施的发现能够帮助阐明通过从大型语言语料库中学习来获得这些技能的方式。
English
Large Language Models (LLMs) have the capacity of performing complex
scheduling in a multi-agent system and can coordinate these agents into
completing sophisticated tasks that require extensive collaboration. However,
despite the introduction of numerous gaming frameworks, the community has
insufficient benchmarks towards building general multi-agents collaboration
infrastructure that encompass both LLM and human-NPCs collaborations. In this
work, we propose a novel infrastructure - MindAgent - to evaluate planning and
coordination emergent capabilities for gaming interaction. In particular, our
infrastructure leverages existing gaming framework, to i) require understanding
of the coordinator for a multi-agent system, ii) collaborate with human players
via un-finetuned proper instructions, and iii) establish an in-context learning
on few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a new
gaming scenario and related benchmark that dispatch a multi-agent collaboration
efficiency and supervise multiple agents playing the game simultaneously. We
conduct comprehensive evaluations with new auto-metric CoS for calculating the
collaboration efficiency. Finally, our infrastructure can be deployed into
real-world gaming scenarios in a customized VR version of CUISINEWORLD and
adapted in existing broader Minecraft gaming domain. We hope our findings on
LLMs and the new infrastructure for general-purpose scheduling and coordination
can help shed light on how such skills can be obtained by learning from large
language corpora.