ASTRA：AI软件助手的自主时空红队测试

摘要

诸如GitHub Copilot等AI编程助手正迅速改变软件开发的面貌，但其安全性仍存在极大不确定性，尤其是在网络安全等高风险领域。现有的红队测试工具多依赖固定基准或不切实际的提示，往往遗漏了许多现实世界中的漏洞。我们推出了ASTRA，一个旨在系统性揭示AI驱动代码生成与安全指导系统安全缺陷的自动化代理系统。ASTRA通过三个阶段运作：(1)构建结构化的领域知识图谱，以建模复杂的软件任务及已知弱点；(2)在知识图谱的引导下，对每个目标模型进行在线漏洞探索，自适应地探测其输入空间（即空间探索）及推理过程（即时间探索）；(3)生成高质量违规诱导案例，以提升模型的对齐度。与以往方法不同，ASTRA专注于开发者实际可能提出的真实输入请求，并利用离线抽象引导的领域建模与在线领域知识图谱适应，揭示边缘案例漏洞。在两大评估领域中，ASTRA发现的漏洞比现有技术多出11%至66%，其生成的测试案例使对齐训练效果提升17%，彰显了其在构建更安全AI系统中的实用价值。

English

AI coding assistants like GitHub Copilot are rapidly transforming software development, but their safety remains deeply uncertain-especially in high-stakes domains like cybersecurity. Current red-teaming tools often rely on fixed benchmarks or unrealistic prompts, missing many real-world vulnerabilities. We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in three stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs; and (3) it generates high-quality violation-inducing cases to improve model alignment. Unlike prior methods, ASTRA focuses on realistic inputs-requests that developers might actually ask-and uses both offline abstraction guided domain modeling and online domain knowledge graph adaptation to surface corner-case vulnerabilities. Across two major evaluation domains, ASTRA finds 11-66% more issues than existing techniques and produces test cases that lead to 17% more effective alignment training, showing its practical value for building safer AI systems.

ASTRA：AI软件助手的自主时空红队测试

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

摘要

Support