面向攻击性网络安全代理的动态风险评估
Dynamic Risk Assessments for Offensive Cybersecurity Agents
May 23, 2025
作者: Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson
cs.AI
摘要
基础模型正日益成为更优秀的自主编程者,这引发了它们可能自动化危险网络攻击操作的前景。当前的前沿模型审计探究了此类智能体在网络安全方面的风险,但大多未能考虑到现实中对手可获得的自由度。特别是在具备强大验证机制和财务激励的情况下,攻击性网络安全的智能体易于被潜在对手通过迭代改进。我们主张,评估应在网络安全的背景下考虑扩展的威胁模型,强调对手在固定计算预算内,于有状态和无状态环境中可能拥有的不同自由度。我们的研究表明,即便在相对较小的计算预算下(本研究中为8个H100 GPU小时),对手也能将智能体在InterCode CTF上的网络安全能力相对于基线提升超过40%——且无需任何外部协助。这些结果强调了以动态方式评估智能体网络安全风险的必要性,从而描绘出更具代表性的风险图景。
English
Foundation models are increasingly becoming better autonomous programmers,
raising the prospect that they could also automate dangerous offensive
cyber-operations. Current frontier model audits probe the cybersecurity risks
of such agents, but most fail to account for the degrees of freedom available
to adversaries in the real world. In particular, with strong verifiers and
financial incentives, agents for offensive cybersecurity are amenable to
iterative improvement by would-be adversaries. We argue that assessments should
take into account an expanded threat model in the context of cybersecurity,
emphasizing the varying degrees of freedom that an adversary may possess in
stateful and non-stateful environments within a fixed compute budget. We show
that even with a relatively small compute budget (8 H100 GPU Hours in our
study), adversaries can improve an agent's cybersecurity capability on
InterCode CTF by more than 40\% relative to the baseline -- without any
external assistance. These results highlight the need to evaluate agents'
cybersecurity risk in a dynamic manner, painting a more representative picture
of risk.Summary
AI-Generated Summary