SWE-Debate: ソフトウェア課題解決のための競争的マルチエージェント討論

要旨

大規模言語モデル（LLMs）の高度な推論能力により、課題解決は著しい進歩を遂げてきました。最近では、SWE-agentのようなエージェントベースのフレームワークが、自律的でツールを使用するエージェントが複雑なソフトウェアエンジニアリングタスクに取り組むことを可能にし、この進歩をさらに推し進めています。既存のエージェントベースの課題解決アプローチは主にエージェントの独立した探索に基づいていますが、ローカルな解決策に陥りやすく、コードベースの異なる部分にまたがる課題パターンを特定できないことがよくあります。この制限を解決するため、我々はSWE-Debateを提案します。これは、多様な推論パスを促進し、より統合された課題のローカライゼーションを実現する競争的なマルチエージェント討論フレームワークです。SWE-Debateはまず、コード依存グラフをトラバースすることで、複数の障害伝播トレースをローカライゼーション提案として作成します。次に、障害伝播トレースに沿った異なる推論視点を体現する専門エージェント間で3ラウンドの討論を組織します。この構造化された競争により、エージェントは協力的に統合された修正計画に収束します。最後に、この統合された修正計画は、MCTSベースのコード修正エージェントに統合され、パッチ生成が行われます。SWE-benchベンチマークでの実験では、SWE-Debateがオープンソースのエージェントフレームワークにおいて新たな最先端の結果を達成し、ベースラインを大幅に上回る性能を示しました。

English

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.

SWE-Debate: ソフトウェア課題解決のための競争的マルチエージェント討論

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

要旨

Support