SWE-Flow:以測試驅動方式合成軟體工程數據
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
June 10, 2025
作者: Lei Zhang, Jiaxi Yang, Min Yang, Jian Yang, Mouxiang Chen, Jiajun Zhang, Zeyu Cui, Binyuan Hui, Junyang Lin
cs.AI
摘要
我們推出了**SWE-Flow**,這是一個基於測試驅動開發(TDD)的全新數據合成框架。與現有依賴於人工提交問題的軟件工程數據不同,**SWE-Flow**能夠直接從單元測試中自動推斷出增量開發步驟,這些單元測試本質上封裝了高層次的需求。**SWE-Flow**的核心在於構建運行時依賴圖(RDG),該圖精確捕捉函數間的交互,從而生成結構化的、逐步推進的*開發計劃*。在每一步驟中,**SWE-Flow**都會生成部分代碼庫、相應的單元測試以及必要的代碼修改,形成完全可驗證的TDD任務。通過這種方法,我們從真實世界的GitHub項目中提取了16,061個訓練實例和2,020個測試實例,創建了**SWE-Flow-Eval**基準。我們的實驗表明,在此數據集上微調開源模型能顯著提升基於TDD的編程性能。為了促進進一步研究,我們在[Github](https://github.com/Hambaobao/SWE-Flow)上公開了所有代碼、數據集、模型及Docker鏡像。
English
We introduce **SWE-Flow**, a novel data synthesis framework grounded in
Test-Driven Development (TDD). Unlike existing software engineering data that
rely on human-submitted issues, **SWE-Flow** automatically infers incremental
development steps directly from unit tests, which inherently encapsulate
high-level requirements. The core of **SWE-Flow** is the construction of a
Runtime Dependency Graph (RDG), which precisely captures function interactions,
enabling the generation of a structured, step-by-step *development schedule*.
At each step, **SWE-Flow** produces a partial codebase, the corresponding unit
tests, and the necessary code modifications, resulting in fully verifiable TDD
tasks. With this approach, we generated 16,061 training instances and 2,020
test instances from real-world GitHub projects, creating the **SWE-Flow-Eval**
benchmark. Our experiments show that fine-tuning open model on this dataset
significantly improves performance in TDD-based coding. To facilitate further
research, we release all code, datasets, models, and Docker images at
[Github](https://github.com/Hambaobao/SWE-Flow).