超级代理：通用软件工程代理以解决规模化编码任务

摘要

大型语言模型（LLMs）已经彻底改变了软件工程（SE），在各种编码任务中展示出卓越的能力。尽管最近的努力已经基于LLMs为端到端开发任务创建了自主软件代理，但这些系统通常是为特定的SE任务而设计的。我们介绍了HyperAgent，这是一个新颖的通用多代理系统，旨在通过模仿人类开发者的工作流程来解决不同编程语言中广泛的SE任务。HyperAgent由四个专门的代理组成 - 规划者、导航者、代码编辑器和执行者。HyperAgent管理SE任务的整个生命周期，从最初的构思到最终的验证。通过广泛的评估，HyperAgent在各种SE任务中实现了最先进的性能：在GitHub问题解决方案方面，它在SWE-Bench-Lite上取得了25.01%的成功率，在SWE-Bench-Verified上取得了31.40%的成功率，超过了现有方法。此外，HyperAgent在存储库级别的代码生成（RepoExec）以及故障定位和程序修复（Defects4J）方面展现了最先进的性能，通常优于专门的系统。这项工作代表了朝着能够处理各种领域和语言中复杂的多步SE任务的多才多艺的自主代理迈出的重要一步，有可能改变AI辅助软件开发实践。

English

Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks. We introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers' workflows. Comprising four specialized agents - Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates SOTA performance in repository-level code generation (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems. This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages, potentially transforming AI-assisted software development practices.

超级代理：通用软件工程代理以解决规模化编码任务

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

摘要

Summary

Support

Support