MetaAgent-X:透過端對端強化學習突破自動化多智能體系統的天花板
MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning
May 14, 2026
作者: Yaolun Zhang, Yujie Zhao, Nan Wang, Yiran Wu, Jiayu Chang, Yizhao Chen, Qingyun Wu, Jishen Zhao, Huazheng Wang
cs.AI
摘要
自動多智能體系統旨在實例化智能體工作流程,無需依賴手動設計或固定的編排方式。然而,現有的自動化MAS方法仍僅具備部分適應性:它們要麼進行無需訓練的測試時搜索,要麼優化元級設計器,同時保持下游執行智能體固定不變,這便造成了「凍結執行器天花板」效應,且對於自設計與自執行智能體模型的端到端訓練尚未探討。為解決此問題,我們提出MetaAgent-X,這是一個端到端的強化學習框架,可聯合優化自動化MAS的設計與執行。MetaAgent-X實現了基於腳本的MAS生成、執行軌跡收集,以及對設計器和執行器軌跡的信用分配。為支持穩定且可擴展的優化,我們提出執行器-設計器層級展開與階段性共演化,以提升訓練穩定性並揭示設計器與執行器共同演化的動態過程。MetaAgent-X持續優於現有自動化MAS基線,性能提升最高達21.7%。全面的消融實驗表明,設計器和執行器在訓練過程中均持續改進,且有效的自動化MAS學習遵循階段性共演化過程。這些結果確立了端到端可訓練的自動化MAS作為構建自設計與自執行智能體模型的實用範式。
English
Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.