CTF-Dojoを用いた言語モデルエージェントの脆弱性発見トレーニング

要旨

大規模言語モデル（LLMs）は、実行可能なランタイム環境内でトレーニングを行う際に卓越した能力を示し、特に検証可能なフィードバックループを通じてソフトウェアエンジニアリングタスクで優れた成果を上げています。しかし、スケーラブルで汎用性の高い実行基盤環境は依然として不足しており、より高度なMLエージェントのトレーニングにおける進展を妨げています。本論文では、検証可能なフィードバックを伴うLLMトレーニングに特化した初の大規模実行可能ランタイム環境であるCTF-Dojoを紹介します。CTF-Dojoは、658の完全に機能するCapture-The-Flag（CTF）スタイルの課題をDockerコンテナ化し、再現性を保証しています。手動介入なしで迅速なスケーリングを可能にするため、CTF-Forgeという自動化パイプラインを開発しました。これにより、公開されているアーティファクトを数分で即座に使用可能な実行環境に変換し、従来必要とされていた専門家による数週間の設定作業を不要にします。CTF-Dojoから得られた486の高品質で実行検証済みの軌跡を用いてLLMベースのエージェントをトレーニングした結果、InterCode-CTF、NYU CTF Bench、Cybenchという3つの競争力のあるベンチマークにおいて、強力なベースラインに対して最大11.6%の絶対的な性能向上を達成しました。最高性能の32Bモデルは31.9%のPass@1を記録し、DeepSeek-V3-0324やGemini-2.5-Flashのような最先端モデルに匹敵する新たなオープンウェイトの最新技術を確立しました。CTFスタイルのタスクを実行可能エージェント学習のベンチマークとして位置づけることで、CTF-Dojoは、実行基盤のトレーニングシグナルが効果的であるだけでなく、高額なプロプライエタリシステムに依存せずに高性能MLエージェントを進化させる上で極めて重要であることを示しています。

English

Large language models (LLMs) have demonstrated exceptional capabilities when trained within executable runtime environments, notably excelling at software engineering tasks through verified feedback loops. Yet, scalable and generalizable execution-grounded environments remain scarce, limiting progress in training more capable ML agents. We introduce CTF-Dojo, the first large-scale executable runtime tailored for training LLMs with verifiable feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style challenges containerized in Docker with guaranteed reproducibility. To enable rapid scaling without manual intervention, we develop CTF-Forge, an automated pipeline that transforms publicly available artifacts into ready-to-use execution environments in minutes, eliminating weeks of expert configuration traditionally required. We trained LLM-based agents on just 486 high-quality, execution-verified trajectories from CTF-Dojo, achieving up to 11.6% absolute gains over strong baselines across three competitive benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best-performing 32B model reaches 31.9% Pass@1, establishing a new open-weight state-of-the-art that rivals frontier models like DeepSeek-V3-0324 and Gemini-2.5-Flash. By framing CTF-style tasks as a benchmark for executable-agent learning, CTF-Dojo demonstrates that execution-grounded training signals are not only effective but pivotal in advancing high-performance ML agents without dependence on costly proprietary systems.

CTF-Dojoを用いた言語モデルエージェントの脆弱性発見トレーニング

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

要旨

Support