ChatPaper.aiChatPaper

SWE-Debate:用於軟體問題解決的競爭性多代理辯論

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

July 31, 2025
作者: Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, Qianxiang Wang
cs.AI

摘要

得益於大型語言模型(LLMs)的先進推理能力,問題解決已取得了顯著進展。最近,基於代理的框架(如SWE-agent)通過使自主工具使用代理能夠處理複雜的軟件工程任務,進一步推動了這一進展。雖然現有的基於代理的問題解決方法主要依賴於代理的獨立探索,但它們往往陷入局部解決方案,無法識別跨代碼庫不同部分的問題模式。為解決這一局限,我們提出了SWE-Debate,這是一個競爭性的多代理辯論框架,旨在鼓勵多樣化的推理路徑並實現更為統一的問題定位。SWE-Debate首先通過遍歷代碼依賴圖創建多個故障傳播軌跡作為定位提案。然後,它組織了一場三輪辯論,由專門的代理參與,每個代理沿著故障傳播軌跡體現不同的推理視角。這種結構化的競爭使代理能夠協作地收斂於一個統一的修復計劃。最後,這個統一的修復計劃被整合到基於蒙特卡洛樹搜索(MCTS)的代碼修改代理中,用於生成補丁。在SWE-bench基準測試上的實驗表明,SWE-Debate在開源代理框架中取得了新的最先進成果,並大幅超越了基線方法。
English
Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.
PDF92August 4, 2025