에이전트 컨덕터: 다중 에이전트 경쟁 수준 코드 생성을 위한 토폴로지 진화

초록

대규모 언어 모델(LLM) 기반 다중 에이전트 시스템(MAS)은 사전 정의된 상호작용 토폴로지를 통해 전문 에이전트들을 조정하며, 경쟁 수준의 코드 생성과 같은 복잡한 작업에서 잠재력을 보여주고 있습니다. 최근 연구들은 신중하게 설계된 다중 에이전트 워크플로우와 통신 그래프가 협력적 추론을 활용하여 코드 생성 성능을 크게 향상시킬 수 있음을 입증했습니다. 그러나 기존 방법들은 작업 난이도에 따라 토폴로지 밀도를 적응시키지도 않으며, 실행 피드백을 사용하여 인스턴스 내에서 토폴로지를 반복적으로 개선하지도 않아, 불필요한 통신과 성능 병목 현상을 초래합니다. 이러한 문제를 해결하기 위해 우리는 LLM 기반 오케스트레이터 에이전트를 핵심으로 하는 강화 학습 최적화 MAS인 AgentConductor를 제안합니다. 이는 종단간 피드백 기반 동적 상호작용 토폴로지 생성을 가능하게 합니다. AgentConductor는 각 질의에 대해 에이전트 역할과 작업 난이도를 추론한 후, 두 가지 핵심 혁신을 기반으로 작업에 적응되고 밀도를 고려한 계층적 방향성 비순환 그래프(DAG) 토폴로지를 구성합니다. 첫째, 다중 에이전트 상호작용의 통신 인식 수학적 특성을 포착하는 새로운 토폴로지 밀도 함수를 설계합니다. 둘째, 난이도 구간 분할을 채택하여 난이도별 정확한 토폴로지 밀도 상한 측정과 더 세분화된 제어를 위한 과도한 가지치기를 방지합니다. 실험적으로, 세 개의 경쟁 수준 및 두 개의 기초 코드 데이터셋에서 AgentConductor는 최첨단 정확도를 달성하며, 가장 강력한 기준 모델 대비 최대 14.6%의 pass@1 정확도 향상, 13%의 밀도 감소, 68%의 토큰 비용 절감 효과를 보였습니다.

English

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level code generation. Recent studies demonstrate that carefully designed multi-agent workflows and communication graphs can significantly improve code generation performance by leveraging collaborative reasoning. However, existing methods neither adapt topology density to task difficulty nor iteratively refine the topology within an instance using execution feedback, which leads to redundant communication and performance bottlenecks. To address these issues, we propose AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies. For each query, AgentConductor infers agent roles and task difficulty, then constructs a task-adapted, density-aware layered directed acyclic graph (DAG) topology, underpinned by two key innovations. First, we design a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions. Second, we adopt difficulty interval partitioning to avoid excessive pruning for precise topological density upper bound measurement per difficulty level and finer-grained control. Empirically, across three competition-level and two foundational code datasets, AgentConductor achieves state-of-the-art accuracy, outperforming the strongest baseline by up to 14.6% in pass@1 accuracy, 13% in density reduction, and 68% in token cost reduction.

에이전트 컨덕터: 다중 에이전트 경쟁 수준 코드 생성을 위한 토폴로지 진화

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

초록

Support