训练语言模型代理通过CTF-Dojo发现漏洞

摘要

大型语言模型（LLMs）在可执行运行时环境中训练时展现出卓越能力，尤其在通过验证反馈循环处理软件工程任务方面表现突出。然而，可扩展且普遍适用的执行基础环境仍然稀缺，这限制了训练更强大机器学习代理的进展。我们推出了CTF-Dojo，这是首个专为通过可验证反馈训练LLMs而设计的大规模可执行运行时环境，包含658个完全功能的夺旗赛（CTF）式挑战，均封装于Docker中，确保可复现性。为实现无需人工干预的快速扩展，我们开发了CTF-Forge，一个自动化流程，能在几分钟内将公开可用的资源转化为即用型执行环境，省去了传统上数周的专家配置时间。我们仅利用CTF-Dojo中486条高质量、执行验证的轨迹训练了基于LLM的代理，在InterCode-CTF、NYU CTF Bench和Cybench三个竞争性基准测试中，相较于强劲基线，实现了最高11.6%的绝对性能提升。我们表现最佳的32B模型达到了31.9%的Pass@1，确立了新的开放权重最先进水平，可与DeepSeek-V3-0324和Gemini-2.5-Flash等前沿模型媲美。通过将CTF式任务定位为可执行代理学习的基准，CTF-Dojo证明了执行基础训练信号不仅有效，而且是推动高性能机器学习代理进步的关键，无需依赖昂贵的专有系统。

English

Large language models (LLMs) have demonstrated exceptional capabilities when trained within executable runtime environments, notably excelling at software engineering tasks through verified feedback loops. Yet, scalable and generalizable execution-grounded environments remain scarce, limiting progress in training more capable ML agents. We introduce CTF-Dojo, the first large-scale executable runtime tailored for training LLMs with verifiable feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style challenges containerized in Docker with guaranteed reproducibility. To enable rapid scaling without manual intervention, we develop CTF-Forge, an automated pipeline that transforms publicly available artifacts into ready-to-use execution environments in minutes, eliminating weeks of expert configuration traditionally required. We trained LLM-based agents on just 486 high-quality, execution-verified trajectories from CTF-Dojo, achieving up to 11.6% absolute gains over strong baselines across three competitive benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best-performing 32B model reaches 31.9% Pass@1, establishing a new open-weight state-of-the-art that rivals frontier models like DeepSeek-V3-0324 and Gemini-2.5-Flash. By framing CTF-style tasks as a benchmark for executable-agent learning, CTF-Dojo demonstrates that execution-grounded training signals are not only effective but pivotal in advancing high-performance ML agents without dependence on costly proprietary systems.

训练语言模型代理通过CTF-Dojo发现漏洞

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

摘要

Support