MetaAgent-X：通过端到端强化学习突破自动多智能体系统的天花板

摘要

自动多智能体系统旨在实例化智能体工作流，而无需依赖手动设计或固定的编排方式。然而，现有的自动多智能体方法仅具有部分自适应性：它们要么执行无训练的测试时搜索，要么优化元级设计器同时保持下游执行智能体不变，这造成了执行器固化的上限，并导致对自设计与自执行智能体模型的端到端训练尚未被探索。为解决这一问题，我们提出MetaAgent-X，一个端到端强化学习框架，联合优化自动多智能体系统的设计与执行。MetaAgent-X支持基于脚本的多智能体系统生成、执行轨迹收集，以及设计器和执行器轨迹的信用分配。为实现稳定且可扩展的优化，我们提出执行器-设计器分层回滚和阶段式协同进化，以提升训练稳定性并揭示设计器与执行器协同进化的动态过程。MetaAgent-X持续优于现有自动多智能体基线方法，性能提升高达21.7%。全面的消融实验表明，设计器和执行器在训练过程中均得到改进，且有效的自动多智能体系统学习遵循阶段式协同进化过程。这些结果将端到端可训练的自动多智能体系统确立为构建自设计与自执行智能体模型的实用范式。

English

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.