SWE-Universe:将可验证的真实环境扩展至百万规模
SWE-Universe: Scale Real-World Verifiable Environments to Millions
February 2, 2026
作者: Mouxiang Chen, Lei Zhang, Yunlong Feng, Xuwu Wang, Wenting Zhao, Ruisheng Cao, Jiaxi Yang, Jiawei Chen, Mingze Li, Zeyao Ma, Hao Ge, Zongmeng Zhang, Zeyu Cui, Dayiheng Liu, Jingren Zhou, Jianling Sun, Junyang Lin, Binyuan Hui
cs.AI
摘要
我们提出SWE-Universe,一个可扩展的高效框架,用于基于GitHub拉取请求(PR)自动构建真实世界的软件工程(SWE)可验证环境。为克服自动构建中普遍存在的生产良率低、验证器弱、成本高昂等挑战,该框架采用基于高效定制训练模型的构建智能体。该智能体通过迭代式自我验证与环内黑客检测机制,确保可靠生成高保真度的可验证任务。利用该方法,我们将真实世界的多语言SWE环境规模扩展至百万量级(807,693个)。通过大规模智能体中期训练与强化学习实验,我们证明了该环境的深层价值。最终,我们将此技术应用于Qwen3-Max-Thinking模型,在SWE-Bench Verified基准测试中取得75.3%的得分。本工作既为推进下一代编程智能体提供了关键资源,也贡献了稳健的方法论。
English
We propose SWE-Universe, a scalable and efficient framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs). To overcome the prevalent challenges of automatic building, such as low production yield, weak verifiers, and prohibitive cost, our framework utilizes a building agent powered by an efficient custom-trained model. This agent employs iterative self-verification and in-loop hacking detection to ensure the reliable generation of high-fidelity, verifiable tasks. Using this method, we scale the number of real-world multilingual SWE environments to a million scale (807,693). We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning. Finally, we applied this technique to Qwen3-Max-Thinking and achieved a score of 75.3% on SWE-Bench Verified. Our work provides both a critical resource and a robust methodology to advance the next generation of coding agents.