FlowReasoner: 쿼리 수준 메타 에이전트 강화

초록

본 논문은 사용자 쿼리별로 하나의 시스템을 설계하는 쿼리 수준 다중 에이전트 시스템의 자동화를 위해 FlowReasoner라는 쿼리 수준 메타 에이전트를 제안합니다. 핵심 아이디어는 외부 실행 피드백을 통해 추론 기반 메타 에이전트를 유도하는 것입니다. 구체적으로, DeepSeek R1을 정제하여 FlowReasoner에 다중 에이전트 시스템 생성과 관련된 기본 추론 능력을 부여한 후, 외부 실행 피드백을 활용한 강화 학습(RL)을 통해 이를 더욱 향상시킵니다. 성능, 복잡성, 효율성 측면에서 RL 훈련을 안내하기 위해 다목적 보상이 설계되었습니다. 이를 통해 FlowReasoner는 숙고적 추론을 통해 각 사용자 쿼리에 맞춤화된 다중 에이전트 시스템을 생성할 수 있게 됩니다. 엔지니어링 및 경쟁 코드 벤치마크에서의 실험은 FlowReasoner의 우수성을 입증합니다. 특히, 세 가지 벤치마크에서 o1-mini를 10.52% 정확도로 능가하는 성과를 보였습니다. 코드는 https://github.com/sail-sg/FlowReasoner에서 확인할 수 있습니다.

English

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

FlowReasoner: 쿼리 수준 메타 에이전트 강화

FlowReasoner: Reinforcing Query-Level Meta-Agents

초록

Support