Cyber-Zero: Addestramento di Agenti per la Cybersicurezza Senza Runtime

Abstract

I Large Language Model (LLM) hanno ottenuto un successo notevole nei compiti di ingegneria del software quando addestrati con ambienti di runtime eseguibili, in particolare nella risoluzione di issue su GitHub. Tuttavia, tali ambienti di runtime sono spesso indisponibili in altri domini, specialmente nella cybersecurity, dove le configurazioni delle sfide e i contesti di esecuzione sono effimeri o limitati. Presentiamo Cyber-Zero, il primo framework senza runtime per sintetizzare traiettorie di agenti di alta qualità per addestrare LLM nel campo della cybersecurity. Cyber-Zero sfrutta writeup pubblicamente disponibili di CTF e utilizza simulazioni guidate da persona tramite LLM per ricostruire i comportamenti di runtime e generare sequenze di interazione realistiche e a lungo termine senza ambienti reali. Utilizzando le traiettorie sintetizzate da Cyber-Zero, addestriamo agenti basati su LLM che raggiungono miglioramenti di prestazioni assoluti fino al 13,1% rispetto ai modelli di base su tre importanti benchmark CTF: InterCode-CTF, NYU CTF Bench e Cybench. Il nostro modello migliore, Cyber-Zero-32B, stabilisce nuove prestazioni all'avanguardia tra i modelli open-weight, eguagliando le capacità di sistemi proprietari come DeepSeek-V3-0324 e Claude-3.5-Sonnet, offrendo al contempo una superiore convenienza economica, e dimostrando che la sintesi di traiettorie senza runtime può efficacemente democratizzare lo sviluppo di agenti di cybersecurity all'avanguardia.

English

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

Cyber-Zero: Addestramento di Agenti per la Cybersicurezza Senza Runtime

Cyber-Zero: Training Cybersecurity Agents without Runtime

Abstract

Support