MetaAgent-X：エンドツーエンド強化学習による自動マルチエージェントシステムの限界打破

要旨

自動マルチエージェントシステム（MAS）は、手動で設計された固定的なオーケストレーションに依存せずにエージェントワークフローをインスタンス化することを目的としている。しかし、既存の自動MASアプローチは部分的にしか適応的でない。すなわち、訓練なしのテスト時探索を実行するか、下流の実行エージェントを凍結したままメタレベルの設計者を最適化するため、凍結された実行者の上限（frozen-executor ceiling）を生み出し、自己設計・自己実行を行うエージェントモデルのエンドツーエンド訓練は未開拓のままである。この問題に対処するため、我々はMetaAgent-Xを導入する。これは、自動MASの設計と実行を共同最適化するエンドツーエンドの強化学習フレームワークである。MetaAgent-Xは、スクリプトベースのMAS生成、実行ロールアウトの収集、ならびに設計者と実行者の両方の軌跡に対するクレジット割り当てを可能にする。安定かつスケーラブルな最適化を支援するため、Executor Designer Hierarchical Rollout（実行設計者階層的ロールアウト）およびStagewise Co-evolution（段階的共進化）を提案し、訓練の安定性を向上させるとともに、設計者と実行者の共進化のダイナミクスを明らかにする。MetaAgent-Xは既存の自動MASベースラインを一貫して上回り、最大21.7%の改善を達成する。包括的なアブレーション研究により、訓練を通じて設計者と実行者の両方が改善されること、また効果的な自動MAS学習は段階的共進化のプロセスに従うことが示される。これらの結果は、自己設計・自己実行エージェントモデルを構築するための実用的なパラダイムとして、エンドツーエンドで学習可能な自動MASを確立するものである。

English

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.