大規模言語モデルを用いたモジュール型協調的エージェントの構築

要旨

大規模言語モデル（LLMs）は、さまざまな領域における単一エージェントの具現化タスクにおいて、印象的な計画能力を示してきました。しかし、多エージェント協調における計画とコミュニケーション能力については、これらが知的な具現化エージェントにとって重要なスキルであるにもかかわらず、まだ明らかになっていません。本論文では、LLMsを多エージェント協調に活用する新しいフレームワークを提案し、さまざまな具現化環境でテストします。私たちのフレームワークは、具現化エージェントが他の具現化エージェントや人間と計画し、コミュニケーションを取り、協力して長期的なタスクを効率的に達成することを可能にします。GPT-4のような最近のLLMsが、ファインチューニングや少数ショットプロンプトを必要とせずに、私たちのフレームワークを使用して強力な計画ベースの手法を上回り、効果的なコミュニケーションを発現できることを示します。また、自然言語でコミュニケーションを行うLLMベースのエージェントが、人間からの信頼をより多く獲得し、より効果的に協力できることを発見しました。私たちの研究は、LLMsの具現化AIにおける可能性を強調し、多エージェント協調の将来の研究の基盤を築きます。ビデオはプロジェクトウェブサイトhttps://vis-www.cs.umass.edu/Co-LLM-Agents/でご覧いただけます。

English

Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains. However, their capacity for planning and communication in multi-agent cooperation remains unclear, even though these are crucial skills for intelligent embodied agents. In this paper, we present a novel framework that utilizes LLMs for multi-agent cooperation and tests it in various embodied environments. Our framework enables embodied agents to plan, communicate, and cooperate with other embodied agents or humans to accomplish long-horizon tasks efficiently. We demonstrate that recent LLMs, such as GPT-4, can surpass strong planning-based methods and exhibit emergent effective communication using our framework without requiring fine-tuning or few-shot prompting. We also discover that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans. Our research underscores the potential of LLMs for embodied AI and lays the foundation for future research in multi-agent cooperation. Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.

大規模言語モデルを用いたモジュール型協調的エージェントの構築

Building Cooperative Embodied Agents Modularly with Large Language Models

要旨

Support