AgensFlow：多智能體系統的協調策略基礎

摘要

建構於大型語言模型上的多智能體系統需要大量難以事先確定的協調決策：應調用何種技能協議、由哪個智能體角色執行子任務、每個角色應綁定哪種模型、角色之間如何互動、何時使用檢索或驗證，以及何時完全省略某個步驟。這些決策與任務環境及運算限制相互影響，因此靜態流程與一次性模型比較僅能呈現設計空間的有限面向。本文介紹AgensFlow，一個將多智能體協調視為部分可觀測條件下之線上策略學習問題的開源框架。該框架使協調決策可被觀測，並可從重複的軌跡中學習，而非將技能、角色、模型、拓撲與評估選擇視為固定流程設計。 AgensFlow在兩個語料庫上進行評估：分散式系統事故處理任務與安全公告任務。評估結果顯示三項主要成果：學習型路由在協作密集類別上可達到比固定流程基線更優的運作點；skip:X將拓撲壓縮獨立為基底中的關鍵環節；熱啟動策略圖可在維持高原品質的前提下降低探索成本。整體而言，結果支持學習型且可稽核的路由能較靜態接線改善協作密集型多智能體工作流程。

English

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to omit a step entirely. These choices interact with task regime and operational constraints, so static pipelines and one-off model comparisons provide only a limited view of the design space. This paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as an online policy-learning problem under partial observability. The framework makes coordination decisions observable and learnable from repeated trajectories, rather than treating skill, role, model, topology, and evaluation choices as fixed pipeline design. AgensFlow is evaluated on two corpora: distributed-systems incident tasks and security-advisory tasks. The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost while preserving plateau quality. Overall, the results support that learned, auditable routing can improve coordination-heavy multi-agent workflows over static wiring.