Cyber-Zero：无需运行时训练网络安全代理

摘要

大型语言模型（LLMs）在配备可执行运行时环境进行训练后，于软件工程任务中取得了显著成就，尤其是在解决GitHub问题方面。然而，此类运行时环境在其他领域，尤其是网络安全领域，往往难以获取，因为挑战配置和执行环境具有短暂性或受限性。我们提出了Cyber-Zero，这是首个无需运行时的框架，旨在合成高质量代理轨迹以训练网络安全领域的LLMs。Cyber-Zero利用公开可得的CTF（Capture The Flag）解题报告，采用角色驱动的LLM模拟技术，逆向工程运行时行为，并在无实际环境的情况下生成逼真、长周期的交互序列。通过Cyber-Zero合成的轨迹，我们训练的基于LLM的代理在三个主要CTF基准测试——InterCode-CTF、NYU CTF Bench和Cybench上，相较于基线模型实现了最高13.1%的绝对性能提升。我们的最佳模型Cyber-Zero-32B，在开源权重模型中确立了新的性能标杆，其能力与DeepSeek-V3-0324和Claude-3.5-Sonnet等专有系统相当，同时提供了更优的成本效益，证明了无需运行时的轨迹合成能有效促进顶尖网络安全代理的普及化发展。

English

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

Cyber-Zero：无需运行时训练网络安全代理

Cyber-Zero: Training Cybersecurity Agents without Runtime

摘要

Support