에이전트 없음: LLM 기반 소프트웨어 엔지니어링 에이전트의 이해

초록

대규모 언어 모델(LLM)의 최근 발전은 코드 합성, 프로그램 수리, 테스트 생성 등 소프트웨어 개발 작업의 자동화를 크게 진전시켰습니다. 더 최근에는 연구자들과 산업계 실무자들이 종단 간(end-to-end) 소프트웨어 개발 작업을 수행하기 위한 다양한 자율 LLM 에이전트를 개발했습니다. 이러한 에이전트는 도구를 사용하고 명령을 실행하며 환경으로부터 피드백을 관찰하고 미래의 행동을 계획할 수 있는 능력을 갖추고 있습니다. 그러나 이러한 에이전트 기반 접근 방식의 복잡성과 현재 LLM의 제한된 능력으로 인해 다음과 같은 질문이 제기됩니다: 정말 복잡한 자율 소프트웨어 에이전트를 사용해야 할까요? 이 질문에 답하기 위해 우리는 Agentless를 구축했습니다. Agentless는 소프트웨어 개발 문제를 자동으로 해결하기 위한 에이전트 없는 접근 방식입니다. 에이전트 기반 접근 방식의 장황하고 복잡한 설정과 비교하여, Agentless는 LLM이 미래의 행동을 결정하거나 복잡한 도구를 사용하지 않고도, 단순화된 두 단계 프로세스인 문제 위치 파악(localization)과 수리(repair)를 사용합니다. 인기 있는 SWE-bench Lite 벤치마크에서의 결과는 놀랍게도 단순한 Agentless가 기존의 모든 오픈소스 소프트웨어 에이전트와 비교하여 가장 높은 성능(27.33%)과 가장 낮은 비용(\$0.34)을 달성할 수 있음을 보여줍니다! 또한, 우리는 SWE-bench Lite의 문제를 수동으로 분류하고 정확한 정답 패치가 있거나 불충분하거나 오해의 소지가 있는 문제 설명을 가진 문제들을 발견했습니다. 따라서 이러한 문제가 있는 이슈를 제외하여 더 엄격한 평가와 비교를 수행할 수 있는 SWE-bench Lite-S를 구성했습니다. 우리의 작업은 자율 소프트웨어 개발에서 간단하고 해석 가능한 기술의 현재 간과된 잠재력을 강조합니다. 우리는 Agentless가 자율 소프트웨어 에이전트의 기준선, 시작점, 그리고 지평을 재설정하고 이 중요한 방향으로의 미래 연구에 영감을 줄 수 있기를 바랍니다.

English

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost (\$0.34) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

에이전트 없음: LLM 기반 소프트웨어 엔지니어링 에이전트의 이해

Agentless: Demystifying LLM-based Software Engineering Agents

초록

Support