智能体应该说什么？面向高效多智能体系统的动作-状态通信

摘要

基于大语言模型的多智能体系统（MAS）通常围绕角色、流水线和轮次调度来组织，而智能体之间传递的内容常常被保留为不受约束的自然语言。然而，这种自由形式的通信会迅速膨胀令牌使用量，消耗共享上下文窗口，并最终影响系统性能和推理成本。我们分析了跨越两种MAS拓扑的五种常见智能体间通信策略，发现不存在普遍最优的固定策略。相反，有效的智能体间信息始终保留下游智能体所需的以行动为中心的信息。基于此，我们提出了PACT（协议化动作状态通信与传输）方法，该方法将智能体间通信视为一个公共状态更新问题，并在每个原始智能体输出进入共享历史之前，将其压缩为紧凑的动作状态记录。在不同的MAS拓扑下，PACT始终能改善性能与成本之间的权衡，在显著减少令牌使用量的同时实现相当或更强的任务性能。这些增益延伸到了生产级编码工具：PACT使OpenHands的解析率提升，同时将每条解析的令牌使用量降低10%；而在SWE-agent上，PACT在保持解析率不变的同时将输入令牌减半。我们的代码已开源在https://github.com/iNLP-Lab/PACT。

English

Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.