CTF-Dojo를 활용하여 언어 모델 에이전트가 취약점을 찾도록 훈련하기

초록

대규모 언어 모델(LLMs)은 실행 가능한 런타임 환경 내에서 훈련될 때 뛰어난 능력을 보여주며, 특히 검증된 피드백 루프를 통해 소프트웨어 엔지니어링 작업에서 탁월한 성과를 거두고 있습니다. 그러나 확장 가능하고 일반화 가능한 실행 기반 환경은 여전히 부족하여, 더 능력 있는 ML 에이전트를 훈련하는 데 있어 진전이 제한되고 있습니다. 우리는 검증 가능한 피드백을 통해 LLMs를 훈련하기 위해 특별히 설계된 첫 번째 대규모 실행 가능한 런타임인 CTF-Dojo를 소개합니다. 이 환경은 Docker에 컨테이너화된 658개의 완전히 기능적인 Capture-The-Flag(CTF) 스타일의 도전 과제를 포함하며, 재현성을 보장합니다. 수동 개입 없이 빠르게 확장할 수 있도록, 우리는 공개적으로 이용 가능한 아티팩트를 몇 분 안에 바로 사용할 수 있는 실행 환경으로 변환하는 자동화된 파이프라인인 CTF-Forge를 개발했습니다. 이를 통해 전통적으로 필요했던 전문가의 수주간의 설정 작업을 제거했습니다. 우리는 CTF-Dojo에서 단 486개의 고품질 실행 검증 궤적을 사용하여 LLM 기반 에이전트를 훈련시켰고, InterCode-CTF, NYU CTF Bench, Cybench 등 세 가지 경쟁력 있는 벤치마크에서 강력한 베이스라인 대비 최대 11.6%의 절대적 성능 향상을 달성했습니다. 우리의 최고 성능을 보인 32B 모델은 31.9%의 Pass@1을 달성하며, DeepSeek-V3-0324 및 Gemini-2.5-Flash와 같은 최첨단 모델에 필적하는 새로운 오픈 웨이트 최신 기술을 확립했습니다. CTF 스타일 작업을 실행 가능한 에이전트 학습을 위한 벤치마크로 설정함으로써, CTF-Dojo는 실행 기반 훈련 신호가 비용이 많이 드는 독점 시스템에 의존하지 않고도 고성능 ML 에이전트를 발전시키는 데 있어 효과적일 뿐만 아니라 필수적임을 입증합니다.

English

Large language models (LLMs) have demonstrated exceptional capabilities when trained within executable runtime environments, notably excelling at software engineering tasks through verified feedback loops. Yet, scalable and generalizable execution-grounded environments remain scarce, limiting progress in training more capable ML agents. We introduce CTF-Dojo, the first large-scale executable runtime tailored for training LLMs with verifiable feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style challenges containerized in Docker with guaranteed reproducibility. To enable rapid scaling without manual intervention, we develop CTF-Forge, an automated pipeline that transforms publicly available artifacts into ready-to-use execution environments in minutes, eliminating weeks of expert configuration traditionally required. We trained LLM-based agents on just 486 high-quality, execution-verified trajectories from CTF-Dojo, achieving up to 11.6% absolute gains over strong baselines across three competitive benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best-performing 32B model reaches 31.9% Pass@1, establishing a new open-weight state-of-the-art that rivals frontier models like DeepSeek-V3-0324 and Gemini-2.5-Flash. By framing CTF-style tasks as a benchmark for executable-agent learning, CTF-Dojo demonstrates that execution-grounded training signals are not only effective but pivotal in advancing high-performance ML agents without dependence on costly proprietary systems.

CTF-Dojo를 활용하여 언어 모델 에이전트가 취약점을 찾도록 훈련하기

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

초록

Support