多样性增强智能:整合软件工程代理的专业知识
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
August 13, 2024
作者: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong
cs.AI
摘要
大型语言模型(LLM)代理在解决现实世界软件工程(SWE)问题方面展现出巨大潜力。最先进的开源SWE代理可以解决SWE-Bench Lite中超过27%的真实GitHub问题。然而,这些复杂的代理框架表现出不同的优势,在某些任务上表现出色,而在其他任务上表现不佳。为了充分利用这些代理的多样性,我们提出了DEI(多样性增强智能)框架,该框架利用它们独特的专业知识。DEI作为现有SWE代理框架之上的元模块,管理代理集合以增强问题解决能力。实验结果显示,由DEI指导的代理委员会能够大幅超越最佳个体代理的表现。例如,一组开源SWE代理,在SWE-Bench Lite上最大个体解决率为27.3%,使用DEI可以实现34.3%的解决率,提高25%,超过大多数闭源解决方案。我们表现最佳的组合在SWE-Bench Lite上以55%的解决率脱颖而出,获得最高排名。我们的研究结果有助于合作式人工智能系统研究领域的不断发展,以及它们解决复杂软件工程挑战的潜力。
English
Large language model (LLM) agents have shown great potential in solving
real-world software engineering (SWE) problems. The most advanced open-source
SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite.
However, these sophisticated agent frameworks exhibit varying strengths,
excelling in certain tasks while underperforming in others. To fully harness
the diversity of these agents, we propose DEI (Diversity Empowered
Intelligence), a framework that leverages their unique expertise. DEI functions
as a meta-module atop existing SWE agent frameworks, managing agent collectives
for enhanced problem-solving. Experimental results show that a DEI-guided
committee of agents is able to surpass the best individual agent's performance
by a large margin. For instance, a group of open-source SWE agents, with a
maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3%
resolve rate with DEI, making a 25% improvement and beating most closed-source
solutions. Our best-performing group excels with a 55% resolve rate, securing
the highest ranking on SWE-Bench Lite. Our findings contribute to the growing
body of research on collaborative AI systems and their potential to solve
complex software engineering challenges.Summary
AI-Generated Summary