網路零風險：無需運行時訓練網路安全代理

摘要

大型語言模型（LLMs）在配備可執行運行環境的訓練下，於軟體工程任務中取得了顯著成功，特別是在解決GitHub問題方面。然而，此類運行環境在其他領域往往不可用，尤其是在網絡安全領域，挑戰配置和執行環境通常是短暫或受限的。我們提出了Cyber-Zero，這是首個無需運行環境的框架，用於合成高質量的代理軌跡來訓練網絡安全LLMs。Cyber-Zero利用公開可用的CTF（Capture The Flag）賽後報告，並採用角色驅動的LLM模擬來逆向工程運行行為，生成真實、長期的互動序列，而無需實際環境。使用Cyber-Zero合成的軌跡，我們訓練了基於LLM的代理，在三個著名的CTF基準測試：InterCode-CTF、NYU CTF Bench和Cybench上，相較於基準模型，實現了最高13.1%的絕對性能提升。我們的最佳模型Cyber-Zero-32B，在開放權重模型中建立了新的最先進性能，匹配了DeepSeek-V3-0324和Claude-3.5-Sonnet等專有系統的能力，同時提供了更優的成本效益，並證明了無需運行環境的軌跡合成能夠有效促進最先進網絡安全代理的開發普及。

English

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

網路零風險：無需運行時訓練網路安全代理

Cyber-Zero: Training Cybersecurity Agents without Runtime

摘要

Support