SWE-Master: Sbloccare il Potenziale degli Agenti di Ingegneria del Software tramite Post-Addestramento

Abstract

In questo rapporto tecnico, presentiamo SWE-Master, un framework open-source e completamente riproducibile per il post-addestramento finalizzato alla creazione di agenti efficaci per l'ingegneria del software. SWE-Master esplora sistematicamente l'intera pipeline di sviluppo degli agenti, includendo la sintesi di traiettorie insegnanti e la curatela dei dati, l'SFT a lungo orizzonte, l'RL con feedback di esecuzione reale e la progettazione del framework di inferenza. Partendo da un modello base open-source con capacità SWE iniziali limitate, SWE-Master dimostra come un metodo di ottimizzazione sistematica possa elicitare forti abilità di risoluzione di compiti SWE complessi e a lungo termine. Valutiamo SWE-Master su SWE-bench Verified, un benchmark standard per compiti realistici di ingegneria del software. In condizioni sperimentali identiche, il nostro approccio raggiunge un tasso di risoluzione del 61.4% con Qwen2.5-Coder-32B, superando sostanzialmente i baseline open-source esistenti. Incorporando ulteriormente il test-time scaling (TTS) con feedback ambientale basato su LLM, SWE-Master raggiunge il 70.8% con TTS@8, dimostrando un forte potenziale prestazionale. SWE-Master fornisce una base pratica e trasparente per far avanzare la ricerca riproducibile sugli agenti di ingegneria del software. Il codice è disponibile all'indirizzo https://github.com/RUCAIBox/SWE-Master.

English

In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation, long-horizon SFT, RL with real execution feedback, and inference framework design. Starting from an open-source base model with limited initial SWE capability, SWE-Master demonstrates how systematical optimization method can elicit strong long-horizon SWE task solving abilities. We evaluate SWE-Master on SWE-bench Verified, a standard benchmark for realistic software engineering tasks. Under identical experimental settings, our approach achieves a resolve rate of 61.4\% with Qwen2.5-Coder-32B, substantially outperforming existing open-source baselines. By further incorporating test-time scaling~(TTS) with LLM-based environment feedback, SWE-Master reaches 70.8\% at TTS@8, demonstrating a strong performance potential. SWE-Master provides a practical and transparent foundation for advancing reproducible research on software engineering agents. The code is available at https://github.com/RUCAIBox/SWE-Master.

SWE-Master: Sbloccare il Potenziale degli Agenti di Ingegneria del Software tramite Post-Addestramento

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Abstract

Support