无代理:揭秘基于LLM的软件工程代理
Agentless: Demystifying LLM-based Software Engineering Agents
July 1, 2024
作者: Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang
cs.AI
摘要
最近大型语言模型(LLMs)的进展显著推动了软件开发任务的自动化,包括代码合成、程序修复和测试生成。最近,研究人员和行业从业者开发了各种自主的LLM代理,用于执行端到端的软件开发任务。这些代理具备使用工具、运行命令、观察环境反馈以及规划未来行动的能力。然而,这些基于代理的方法的复杂性,以及当前LLMs的有限能力,引发了一个问题:我们真的需要使用复杂的自主软件代理吗?为了尝试回答这个问题,我们构建了Agentless——一种无代理的方法来自动解决软件开发问题。与基于代理方法的冗长复杂设置相比,Agentless采用了一个简单的两阶段过程,即定位和修复,而不让LLM决定未来的行动或使用复杂的工具。我们在流行的SWE-bench Lite基准测试上的结果显示,令人惊讶的是,简单的Agentless能够实现最高性能(27.33%)和最低成本(\$0.34),相较于所有现有的开源软件代理!此外,我们手动对SWE-bench Lite中的问题进行了分类,发现存在确切的修补程序或问题描述不足/误导性的问题。因此,我们构建了SWE-bench Lite-S,排除了这类问题来进行更严格的评估和比较。我们的工作突显了简单、可解释的技术在自主软件开发中被当前忽视的潜力。我们希望Agentless能够帮助重新设定自主软件代理的基线、起点和发展方向,并激发未来沿着这一关键方向开展工作。
English
Recent advancements in large language models (LLMs) have significantly
advanced the automation of software development tasks, including code
synthesis, program repair, and test generation. More recently, researchers and
industry practitioners have developed various autonomous LLM agents to perform
end-to-end software development tasks. These agents are equipped with the
ability to use tools, run commands, observe feedback from the environment, and
plan for future actions. However, the complexity of these agent-based
approaches, together with the limited abilities of current LLMs, raises the
following question: Do we really have to employ complex autonomous software
agents? To attempt to answer this question, we build Agentless -- an agentless
approach to automatically solve software development problems. Compared to the
verbose and complex setup of agent-based approaches, Agentless employs a
simplistic two-phase process of localization followed by repair, without
letting the LLM decide future actions or operate with complex tools. Our
results on the popular SWE-bench Lite benchmark show that surprisingly the
simplistic Agentless is able to achieve both the highest performance (27.33%)
and lowest cost (\$0.34) compared with all existing open-source software
agents! Furthermore, we manually classified the problems in SWE-bench Lite and
found problems with exact ground truth patch or insufficient/misleading issue
descriptions. As such, we construct SWE-bench Lite-S by excluding such
problematic issues to perform more rigorous evaluation and comparison. Our work
highlights the current overlooked potential of a simple, interpretable
technique in autonomous software development. We hope Agentless will help reset
the baseline, starting point, and horizon for autonomous software agents, and
inspire future work along this crucial direction.Summary
AI-Generated Summary