超级代理:通用软件工程代理以解决规模化编码任务
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
September 9, 2024
作者: Huy Nhat Phan, Phong X. Nguyen, Nghi D. Q. Bui
cs.AI
摘要
大型语言模型(LLMs)已经彻底改变了软件工程(SE),在各种编码任务中展示出卓越的能力。尽管最近的努力已经基于LLMs为端到端开发任务创建了自主软件代理,但这些系统通常是为特定的SE任务而设计的。我们介绍了HyperAgent,这是一个新颖的通用多代理系统,旨在通过模仿人类开发者的工作流程来解决不同编程语言中广泛的SE任务。HyperAgent由四个专门的代理组成 - 规划者、导航者、代码编辑器和执行者。HyperAgent管理SE任务的整个生命周期,从最初的构思到最终的验证。通过广泛的评估,HyperAgent在各种SE任务中实现了最先进的性能:在GitHub问题解决方案方面,它在SWE-Bench-Lite上取得了25.01%的成功率,在SWE-Bench-Verified上取得了31.40%的成功率,超过了现有方法。此外,HyperAgent在存储库级别的代码生成(RepoExec)以及故障定位和程序修复(Defects4J)方面展现了最先进的性能,通常优于专门的系统。这项工作代表了朝着能够处理各种领域和语言中复杂的多步SE任务的多才多艺的自主代理迈出的重要一步,有可能改变AI辅助软件开发实践。
English
Large Language Models (LLMs) have revolutionized software engineering (SE),
demonstrating remarkable capabilities in various coding tasks. While recent
efforts have produced autonomous software agents based on LLMs for end-to-end
development tasks, these systems are typically designed for specific SE tasks.
We introduce HyperAgent, a novel generalist multi-agent system designed to
address a wide spectrum of SE tasks across different programming languages by
mimicking human developers' workflows. Comprising four specialized agents -
Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full
lifecycle of SE tasks, from initial conception to final verification. Through
extensive evaluations, HyperAgent achieves state-of-the-art performance across
diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40%
on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods.
Furthermore, HyperAgent demonstrates SOTA performance in repository-level code
generation (RepoExec), and in fault localization and program repair
(Defects4J), often outperforming specialized systems. This work represents a
significant advancement towards versatile, autonomous agents capable of
handling complex, multi-step SE tasks across various domains and languages,
potentially transforming AI-assisted software development practices.Summary
AI-Generated Summary