《Minecraft中的幽灵：基于大型语言模型的文本知识和记忆在开放世界环境中通常具备能力的代理》

摘要

近年来，Minecraft引人入胜的世界吸引了大量研究兴趣，成为开发能够在开放世界环境中运作的智能代理的丰富平台。然而，当前的研究领域主要集中在特定目标上，如流行的“获取钻石”任务，并尚未有效地推广到更广泛的任务范围。此外，“获取钻石”任务的当前领先成功率约为20%，突显了现有方法中基于强化学习（RL）的控制器的局限性。为了解决这些挑战，我们引入了Minecraft中的Ghost（GITM），这是一个新颖的框架，将大型语言模型（LLMs）与基于文本的知识和记忆相结合，旨在在Minecraft中创建通用能力代理（GCAs）。这些代理配备了LLMs的逻辑和常识能力，可以熟练地在基于文本交互的复杂、稀疏奖励环境中导航。我们开发了一组结构化动作，并利用LLMs为代理生成行动计划。由此产生的基于LLMs的代理明显超越了先前的方法，在“获取钻石”任务的成功率上取得了显著提高，成功率提高了+47.5%，表现出比传统RL控制器更强大的稳健性。值得注意的是，我们的代理是第一个获取Minecraft主世界技术树中所有物品的代理，展示了其广泛的能力。GITM在训练时不需要任何GPU，而单个具有32个CPU核心的CPU节点就足够了。这项研究展示了LLMs在开发处理长期、复杂任务并适应开放世界环境中的不确定性的能力代理方面的潜力。请访问项目网站https://github.com/OpenGVLab/GITM。

English

The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20%, highlighting the limitations of Reinforcement Learning (RL) based controllers used in existing methods. To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate on the "ObtainDiamond" task, demonstrating superior robustness compared to traditional RL-based controllers. Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website at https://github.com/OpenGVLab/GITM.

《Minecraft中的幽灵：基于大型语言模型的文本知识和记忆在开放世界环境中通常具备能力的代理》

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

摘要

Support