無代理人:揭開基於LLM的軟體工程代理的神秘面紗
Agentless: Demystifying LLM-based Software Engineering Agents
July 1, 2024
作者: Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang
cs.AI
摘要
近年來,大型語言模型(LLMs)的最新進展顯著推動了軟體開發任務的自動化,包括代碼合成、程序修復和測試生成。最近,研究人員和業界從業者開發了各種自主的LLM代理,以執行端到端的軟體開發任務。這些代理具備使用工具、運行命令、觀察環境反饋以及規劃未來行動的能力。然而,這些基於代理的方法的複雜性,加上當前LLMs的能力有限,引發了以下問題:我們是否真的需要使用複雜的自主軟體代理?為了嘗試回答這個問題,我們建立了Agentless - 一種無代理的方法來自動解決軟體開發問題。與基於代理的方法冗長且複雜的設置相比,Agentless採用了一個簡單的兩階段過程,即本地化後修復,而不讓LLM決定未來行動或使用複雜工具。我們在流行的SWE-bench Lite基準測試上的結果顯示,令人驚訝的是,簡單的Agentless能夠實現最高性能(27.33%)和最低成本(\$0.34),相較於所有現有的開源軟體代理!此外,我們手動對SWE-bench Lite中的問題進行了分類,發現存在具有確切修補程序或不足/誤導性問題描述的問題。因此,我們通過排除這些問題,構建了SWE-bench Lite-S,以進行更嚴格的評估和比較。我們的工作突顯了在自主軟體開發中一種簡單、可解釋的技術目前被忽視的潛力。我們希望Agentless將有助於重設自主軟體代理的基準、起點和視野,並激發未來沿著這一重要方向進行的工作。
English
Recent advancements in large language models (LLMs) have significantly
advanced the automation of software development tasks, including code
synthesis, program repair, and test generation. More recently, researchers and
industry practitioners have developed various autonomous LLM agents to perform
end-to-end software development tasks. These agents are equipped with the
ability to use tools, run commands, observe feedback from the environment, and
plan for future actions. However, the complexity of these agent-based
approaches, together with the limited abilities of current LLMs, raises the
following question: Do we really have to employ complex autonomous software
agents? To attempt to answer this question, we build Agentless -- an agentless
approach to automatically solve software development problems. Compared to the
verbose and complex setup of agent-based approaches, Agentless employs a
simplistic two-phase process of localization followed by repair, without
letting the LLM decide future actions or operate with complex tools. Our
results on the popular SWE-bench Lite benchmark show that surprisingly the
simplistic Agentless is able to achieve both the highest performance (27.33%)
and lowest cost (\$0.34) compared with all existing open-source software
agents! Furthermore, we manually classified the problems in SWE-bench Lite and
found problems with exact ground truth patch or insufficient/misleading issue
descriptions. As such, we construct SWE-bench Lite-S by excluding such
problematic issues to perform more rigorous evaluation and comparison. Our work
highlights the current overlooked potential of a simple, interpretable
technique in autonomous software development. We hope Agentless will help reset
the baseline, starting point, and horizon for autonomous software agents, and
inspire future work along this crucial direction.Summary
AI-Generated Summary