ASTRA:AI软件助手的自主时空红队测试
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
August 5, 2025
作者: Xiangzhe Xu, Guangyu Shen, Zian Su, Siyuan Cheng, Hanxi Guo, Lu Yan, Xuan Chen, Jiasheng Jiang, Xiaolong Jin, Chengpeng Wang, Zhuo Zhang, Xiangyu Zhang
cs.AI
摘要
诸如GitHub Copilot等AI编程助手正迅速改变软件开发的面貌,但其安全性仍存在极大不确定性,尤其是在网络安全等高风险领域。现有的红队测试工具多依赖固定基准或不切实际的提示,往往遗漏了许多现实世界中的漏洞。我们推出了ASTRA,一个旨在系统性揭示AI驱动代码生成与安全指导系统安全缺陷的自动化代理系统。ASTRA通过三个阶段运作:(1)构建结构化的领域知识图谱,以建模复杂的软件任务及已知弱点;(2)在知识图谱的引导下,对每个目标模型进行在线漏洞探索,自适应地探测其输入空间(即空间探索)及推理过程(即时间探索);(3)生成高质量违规诱导案例,以提升模型的对齐度。与以往方法不同,ASTRA专注于开发者实际可能提出的真实输入请求,并利用离线抽象引导的领域建模与在线领域知识图谱适应,揭示边缘案例漏洞。在两大评估领域中,ASTRA发现的漏洞比现有技术多出11%至66%,其生成的测试案例使对齐训练效果提升17%,彰显了其在构建更安全AI系统中的实用价值。
English
AI coding assistants like GitHub Copilot are rapidly transforming software
development, but their safety remains deeply uncertain-especially in
high-stakes domains like cybersecurity. Current red-teaming tools often rely on
fixed benchmarks or unrealistic prompts, missing many real-world
vulnerabilities. We present ASTRA, an automated agent system designed to
systematically uncover safety flaws in AI-driven code generation and security
guidance systems. ASTRA works in three stages: (1) it builds structured
domain-specific knowledge graphs that model complex software tasks and known
weaknesses; (2) it performs online vulnerability exploration of each target
model by adaptively probing both its input space, i.e., the spatial
exploration, and its reasoning processes, i.e., the temporal exploration,
guided by the knowledge graphs; and (3) it generates high-quality
violation-inducing cases to improve model alignment. Unlike prior methods,
ASTRA focuses on realistic inputs-requests that developers might actually
ask-and uses both offline abstraction guided domain modeling and online domain
knowledge graph adaptation to surface corner-case vulnerabilities. Across two
major evaluation domains, ASTRA finds 11-66% more issues than existing
techniques and produces test cases that lead to 17% more effective alignment
training, showing its practical value for building safer AI systems.