訓練語言模型代理以CTF-Dojo尋找漏洞
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
August 25, 2025
作者: Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang
cs.AI
摘要
大型語言模型(LLMs)在可執行的運行時環境中訓練時,展現了卓越的能力,特別是在通過驗證反饋循環處理軟件工程任務方面表現突出。然而,可擴展且具普遍性的執行基礎環境仍然稀缺,這限制了訓練更強大機器學習代理的進展。我們推出了CTF-Dojo,這是首個專為訓練LLMs而設計的大規模可執行運行時環境,配備了658個完全功能的奪旗賽(CTF)式挑戰,這些挑戰被容器化在Docker中,確保了可重現性。為了實現無需人工干預的快速擴展,我們開發了CTF-Forge,這是一個自動化管道,能夠在幾分鐘內將公開可用的工件轉化為即用型執行環境,省去了傳統上需要數週專家配置的時間。我們僅使用來自CTF-Dojo的486條高質量、執行驗證的軌跡來訓練基於LLM的代理,在三個競爭性基準測試中取得了高達11.6%的絕對增益:InterCode-CTF、NYU CTF Bench和Cybench。我們表現最佳的32B模型達到了31.9%的Pass@1,創下了新的開放權重最先進水平,與DeepSeek-V3-0324和Gemini-2.5-Flash等前沿模型相媲美。通過將CTF式任務定位為可執行代理學習的基準,CTF-Dojo證明了基於執行的訓練信號不僅有效,而且對於推進高性能機器學習代理的發展至關重要,而無需依賴昂貴的專有系統。
English
Large language models (LLMs) have demonstrated exceptional capabilities when
trained within executable runtime environments, notably excelling at software
engineering tasks through verified feedback loops. Yet, scalable and
generalizable execution-grounded environments remain scarce, limiting progress
in training more capable ML agents. We introduce CTF-Dojo, the first
large-scale executable runtime tailored for training LLMs with verifiable
feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style
challenges containerized in Docker with guaranteed reproducibility. To enable
rapid scaling without manual intervention, we develop CTF-Forge, an automated
pipeline that transforms publicly available artifacts into ready-to-use
execution environments in minutes, eliminating weeks of expert configuration
traditionally required. We trained LLM-based agents on just 486 high-quality,
execution-verified trajectories from CTF-Dojo, achieving up to 11.6% absolute
gains over strong baselines across three competitive benchmarks: InterCode-CTF,
NYU CTF Bench, and Cybench. Our best-performing 32B model reaches 31.9% Pass@1,
establishing a new open-weight state-of-the-art that rivals frontier models
like DeepSeek-V3-0324 and Gemini-2.5-Flash. By framing CTF-style tasks as a
benchmark for executable-agent learning, CTF-Dojo demonstrates that
execution-grounded training signals are not only effective but pivotal in
advancing high-performance ML agents without dependence on costly proprietary
systems.