AgensFlow: 多智能体系统的协调策略基础

摘要

基于大语言模型（LLMs）构建的多智能体系统需要大量难以预先确定的协调决策：调用哪种技能协议、由哪个智能体角色执行子任务、每个角色绑定哪个模型、角色之间如何交互、何时使用检索或验证，以及何时完全省略某个步骤。这些决策与任务场景和操作约束相互影响，因此静态流水线和单次模型比较只能提供设计空间的有限视角。本文介绍了AgensFlow——一个将多智能体协调视为部分可观测条件下在线策略学习问题的开源框架。该框架使协调决策可观测且能从重复轨迹中学习，而非将技能、角色、模型、拓扑结构和评估选择视为固定的流水线设计。 AgensFlow在两个语料库上进行了评估：分布式系统事件任务和安全公告任务。评估展示了三项主要结果：在协调密集型任务类别中，学习路由相比固定流水线基准达到了更高质量的操作点；skip:X机制将拓扑压缩作为底层的关键组成部分加以隔离；热启动策略图能在保持平台期质量的同时降低探索成本。总体而言，这些结果证明了可学习、可审计的路由机制能够比静态线路连接更有效地改进协调密集型多智能体工作流。

English

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to omit a step entirely. These choices interact with task regime and operational constraints, so static pipelines and one-off model comparisons provide only a limited view of the design space. This paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as an online policy-learning problem under partial observability. The framework makes coordination decisions observable and learnable from repeated trajectories, rather than treating skill, role, model, topology, and evaluation choices as fixed pipeline design. AgensFlow is evaluated on two corpora: distributed-systems incident tasks and security-advisory tasks. The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost while preserving plateau quality. Overall, the results support that learned, auditable routing can improve coordination-heavy multi-agent workflows over static wiring.