多様性が知能を強化する：ソフトウェアエンジニアリングエージェントの専門性の統合

要旨

大規模言語モデル（LLM）エージェントは、現実世界のソフトウェアエンジニアリング（SWE）問題を解決する上で大きな可能性を示しています。最も先進的なオープンソースのSWEエージェントは、SWE-Bench Liteにおける実際のGitHubイシューの27%以上を解決することができます。しかし、これらの高度なエージェントフレームワークは、特定のタスクでは優れている一方で、他のタスクでは性能が低いという多様な強みを持っています。これらのエージェントの多様性を最大限に活用するために、我々はDEI（Diversity Empowered Intelligence）というフレームワークを提案します。DEIは、既存のSWEエージェントフレームワークの上にメタモジュールとして機能し、エージェント集団を管理して問題解決を強化します。実験結果によると、DEIが導くエージェント委員会は、最良の個別エージェントの性能を大幅に上回ることができます。例えば、SWE-Bench Liteで最大27.3%の解決率を持つオープンソースSWEエージェントのグループは、DEIを使用することで34.3%の解決率を達成し、25%の改善を実現し、ほとんどのクローズドソースソリューションを上回ります。我々の最高性能グループは55%の解決率で優れ、SWE-Bench Liteで最高ランクを獲得しました。我々の研究結果は、協調型AIシステムとその複雑なソフトウェアエンジニアリング課題を解決する可能性に関する研究の拡大に貢献します。

English

Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our best-performing group excels with a 55% resolve rate, securing the highest ranking on SWE-Bench Lite. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges.

多様性が知能を強化する：ソフトウェアエンジニアリングエージェントの専門性の統合

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

要旨

Support