AgensFlow: マルチエージェントシステムのための協調ポリシー基盤

要旨

大規模言語モデル（LLM）に基づくマルチエージェントシステムでは、どのスキルプロトコルを呼び出すか、どのエージェント役割がサブタスクを実行すべきか、各役割にどのモデルを割り当てるか、役割間の相互作用の方法、検索や検証をいつ使用するか、あるいはステップを完全に省略するかなど、事前に固定することが難しい多くの協調選択が必要となる。これらの選択はタスクの体制や運用制約と相互作用するため、静的パイプラインや一過性のモデル比較では設計空間の限られた見解しか得られない。本論文では、マルチエージェントの協調を部分観測可能性下でのオンラインポリシー学習問題として扱うオープンソースフレームワークAgensFlowを紹介する。このフレームワークは、スキル、役割、モデル、トポロジー、評価の選択を固定されたパイプライン設計として扱うのではなく、協調の決定を観測可能かつ反復的な軌跡から学習可能にする。 AgensFlowは、分散システムのインシデントタスクとセキュリティアドバイザリタスクの2つのコーパスで評価される。評価では、以下の3つの主要な結果が示される。学習されたルーティングは、協調密集型クラスにおいて固定パイプラインのベースラインよりも高品質な動作点に到達する。skip:Xは、トポロジー圧縮を基盤の意味のある部分として隔離する。ウォームスタートされたポリシーグラフは、プラトー品質を維持しながら探索コストを削減できる。総じて、これらの結果は、学習可能で監査可能なルーティングが、静的な配線よりも協調密集型のマルチエージェントワークフローを改善できることを支持するものである。

English

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to omit a step entirely. These choices interact with task regime and operational constraints, so static pipelines and one-off model comparisons provide only a limited view of the design space. This paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as an online policy-learning problem under partial observability. The framework makes coordination decisions observable and learnable from repeated trajectories, rather than treating skill, role, model, topology, and evaluation choices as fixed pipeline design. AgensFlow is evaluated on two corpora: distributed-systems incident tasks and security-advisory tasks. The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost while preserving plateau quality. Overall, the results support that learned, auditable routing can improve coordination-heavy multi-agent workflows over static wiring.