AgentsNet：多智能體大語言模型中的協調與協同推理

摘要

大型語言模型（LLMs）已展現出強大的問題解決能力，尤其是在多代理系統中的組織應用。然而，這類系統的出現也引發了關於複雜代理網絡能否有效自我組織與協作的諸多疑問。雖然在標準推理基準上的性能測量能反映多代理系統解決推理任務的能力，但這些系統是否能有效利用其拓撲結構仍不明確。為此，我們提出了AgentsNet，一個新的多代理推理基準。借鑒分佈式系統和圖論中的經典問題，AgentsNet旨在衡量多代理系統在給定網絡拓撲下，協作制定問題解決策略、自我組織及有效溝通的能力。我們在AgentsNet上評估了多種基線方法，包括首先需就組織與通信基本協議達成一致的同質代理網絡。我們發現，一些前沿的LLMs在小型網絡中已表現出強勁性能，但隨網絡規模擴大，其表現開始下滑。現有的多代理基準最多涵蓋2至5個代理，而AgentsNet在規模上幾乎無限制，可隨新一代LLMs的發展而擴展。因此，我們還在包含多達100個代理的設置中探測了前沿模型的能力。

English

Large-language models (LLMs) have demonstrated powerful problem-solving capabilities, in particular when organized in multi-agent systems. However, the advent of such systems also raises several questions on the ability of a complex network of agents to effectively self-organize and collaborate. While measuring performance on standard reasoning benchmarks indicates how well multi-agent systems can solve reasoning tasks, it is unclear whether these systems are able to leverage their topology effectively. Here, we propose AgentsNet, a new benchmark for multi-agent reasoning. By drawing inspiration from classical problems in distributed systems and graph theory, AgentsNet measures the ability of multi-agent systems to collaboratively form strategies for problem-solving, self-organization, and effective communication given a network topology. We evaluate a variety of baseline methods on AgentsNet including homogeneous networks of agents which first have to agree on basic protocols for organization and communication. We find that some frontier LLMs are already demonstrating strong performance for small networks but begin to fall off once the size of the network scales. While existing multi-agent benchmarks cover at most 2-5 agents, AgentsNet is practically unlimited in size and can scale with new generations of LLMs. As such, we also probe frontier models in a setup with up to 100 agents.

AgentsNet：多智能體大語言模型中的協調與協同推理

AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs

摘要

Support