AgentConductor：面向多智能体竞赛级代码生成的拓扑演化框架

摘要

基于大语言模型（LLM）驱动的多智能体系统（MAS）通过预定义的交互拓扑协调专业智能体，在竞争级代码生成等复杂任务中展现出巨大潜力。最新研究表明，精心设计的多智能体工作流与通信图能通过协同推理显著提升代码生成性能。然而，现有方法既未根据任务难度自适应调整拓扑密度，也未利用执行反馈在实例内部迭代优化拓扑结构，导致通信冗余与性能瓶颈。为此，我们提出AgentConductor：一种以LLM编排智能体为核心的强化学习优化MAS，可实现端到端反馈驱动的动态交互拓扑生成。针对每个查询，AgentConductor通过两大创新机制推断智能体角色与任务难度，进而构建任务自适应、密度感知的分层有向无环图（DAG）拓扑。其一，我们设计了新型拓扑密度函数，以数学方式表征多智能体交互的通信特征；其二，采用难度区间划分法避免过度剪枝，实现各难度层级拓扑密度上界的精确度量与更细粒度控制。在三个竞争级与两个基础代码数据集上的实验表明，AgentConductor在准确率上达到最优水平，较最强基线在pass@1准确率提升最高达14.6%，拓扑密度降低13%，令牌成本减少68%。

English

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level code generation. Recent studies demonstrate that carefully designed multi-agent workflows and communication graphs can significantly improve code generation performance by leveraging collaborative reasoning. However, existing methods neither adapt topology density to task difficulty nor iteratively refine the topology within an instance using execution feedback, which leads to redundant communication and performance bottlenecks. To address these issues, we propose AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies. For each query, AgentConductor infers agent roles and task difficulty, then constructs a task-adapted, density-aware layered directed acyclic graph (DAG) topology, underpinned by two key innovations. First, we design a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions. Second, we adopt difficulty interval partitioning to avoid excessive pruning for precise topological density upper bound measurement per difficulty level and finer-grained control. Empirically, across three competition-level and two foundational code datasets, AgentConductor achieves state-of-the-art accuracy, outperforming the strongest baseline by up to 14.6% in pass@1 accuracy, 13% in density reduction, and 68% in token cost reduction.