SWE-Flow：以測試驅動方式合成軟體工程數據

摘要

我們推出了**SWE-Flow**，這是一個基於測試驅動開發（TDD）的全新數據合成框架。與現有依賴於人工提交問題的軟件工程數據不同，**SWE-Flow**能夠直接從單元測試中自動推斷出增量開發步驟，這些單元測試本質上封裝了高層次的需求。**SWE-Flow**的核心在於構建運行時依賴圖（RDG），該圖精確捕捉函數間的交互，從而生成結構化的、逐步推進的*開發計劃*。在每一步驟中，**SWE-Flow**都會生成部分代碼庫、相應的單元測試以及必要的代碼修改，形成完全可驗證的TDD任務。通過這種方法，我們從真實世界的GitHub項目中提取了16,061個訓練實例和2,020個測試實例，創建了**SWE-Flow-Eval**基準。我們的實驗表明，在此數據集上微調開源模型能顯著提升基於TDD的編程性能。為了促進進一步研究，我們在[Github](https://github.com/Hambaobao/SWE-Flow)上公開了所有代碼、數據集、模型及Docker鏡像。

English

We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD). Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements. The core of **SWE-Flow** is the construction of a Runtime Dependency Graph (RDG), which precisely captures function interactions, enabling the generation of a structured, step-by-step *development schedule*. At each step, **SWE-Flow** produces a partial codebase, the corresponding unit tests, and the necessary code modifications, resulting in fully verifiable TDD tasks. With this approach, we generated 16,061 training instances and 2,020 test instances from real-world GitHub projects, creating the **SWE-Flow-Eval** benchmark. Our experiments show that fine-tuning open model on this dataset significantly improves performance in TDD-based coding. To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow).

SWE-Flow：以測試驅動方式合成軟體工程數據

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

摘要

Support