エージェントレス：LLMベースのソフトウェアエンジニアリングエージェントの解明

要旨

大規模言語モデル（LLM）の最近の進展により、コード合成、プログラム修復、テスト生成などのソフトウェア開発タスクの自動化が大幅に進展しています。さらに最近では、研究者や業界の実務家が、エンドツーエンドのソフトウェア開発タスクを実行するためのさまざまな自律型LLMエージェントを開発しています。これらのエージェントは、ツールを使用し、コマンドを実行し、環境からのフィードバックを観察し、将来のアクションを計画する能力を備えています。しかし、これらのエージェントベースのアプローチの複雑さと、現在のLLMの能力の限界から、次の疑問が生じます：本当に複雑な自律型ソフトウェアエージェントを採用する必要があるのでしょうか？この疑問に答えるために、我々はAgentlessを構築しました。これは、ソフトウェア開発の問題を自動的に解決するためのエージェントレスなアプローチです。エージェントベースのアプローチの冗長で複雑なセットアップと比較して、Agentlessは、ローカライゼーションに続いて修復を行うというシンプルな2段階のプロセスを採用し、LLMに将来のアクションを決定させたり、複雑なツールを操作させたりしません。人気のあるSWE-bench Liteベンチマークでの結果は、驚くべきことに、シンプルなAgentlessが、既存のすべてのオープンソースソフトウェアエージェントと比較して最高のパフォーマンス（27.33%）と最低のコスト（\$0.34）を達成できることを示しています！さらに、我々はSWE-bench Liteの問題を手動で分類し、正確なグラウンドトゥルースパッチがある問題や不十分/誤解を招く問題説明がある問題を特定しました。そのため、これらの問題を除外したSWE-bench Lite-Sを構築し、より厳密な評価と比較を行いました。我々の研究は、自律型ソフトウェア開発におけるシンプルで解釈可能な技術の現在の見過ごされていた可能性を強調しています。Agentlessが、自律型ソフトウェアエージェントのベースライン、出発点、そして地平線をリセットし、この重要な方向性に沿った将来の研究を刺激することを願っています。

English

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost (\$0.34) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

エージェントレス：LLMベースのソフトウェアエンジニアリングエージェントの解明

Agentless: Demystifying LLM-based Software Engineering Agents

要旨

Support