ChatPaper.aiChatPaper

超级代理:通用软件工程代理以解决规模化编码任务

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

September 9, 2024
作者: Huy Nhat Phan, Phong X. Nguyen, Nghi D. Q. Bui
cs.AI

摘要

大型语言模型(LLMs)已经彻底改变了软件工程(SE),在各种编码任务中展示出卓越的能力。尽管最近的努力已经基于LLMs为端到端开发任务创建了自主软件代理,但这些系统通常是为特定的SE任务而设计的。我们介绍了HyperAgent,这是一个新颖的通用多代理系统,旨在通过模仿人类开发者的工作流程来解决不同编程语言中广泛的SE任务。HyperAgent由四个专门的代理组成 - 规划者、导航者、代码编辑器和执行者。HyperAgent管理SE任务的整个生命周期,从最初的构思到最终的验证。通过广泛的评估,HyperAgent在各种SE任务中实现了最先进的性能:在GitHub问题解决方案方面,它在SWE-Bench-Lite上取得了25.01%的成功率,在SWE-Bench-Verified上取得了31.40%的成功率,超过了现有方法。此外,HyperAgent在存储库级别的代码生成(RepoExec)以及故障定位和程序修复(Defects4J)方面展现了最先进的性能,通常优于专门的系统。这项工作代表了朝着能够处理各种领域和语言中复杂的多步SE任务的多才多艺的自主代理迈出的重要一步,有可能改变AI辅助软件开发实践。
English
Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks. We introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers' workflows. Comprising four specialized agents - Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates SOTA performance in repository-level code generation (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems. This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages, potentially transforming AI-assisted software development practices.

Summary

AI-Generated Summary

PDF122November 16, 2024