ASTRA: AIソフトウェアアシスタントのための自律的時空間レッドチーミング

要旨

GitHub CopilotのようなAIコーディングアシスタントはソフトウェア開発を急速に変革していますが、特にサイバーセキュリティのようなハイステークス領域では、その安全性は依然として不確かです。現在のレッドチームツールは、固定されたベンチマークや非現実的なプロンプトに依存することが多く、現実世界の脆弱性の多くを見逃しています。本論文では、AI駆動のコード生成およびセキュリティガイダンスシステムにおける安全性の欠陥を体系的に発見するために設計された自動エージェントシステム、ASTRAを紹介します。ASTRAは3つの段階で動作します：(1)複雑なソフトウェアタスクと既知の弱点をモデル化するドメイン固有の構造化知識グラフを構築します；(2)知識グラフに基づいて、各ターゲットモデルの入力空間（空間的探索）と推論プロセス（時間的探索）を適応的に探査することで、オンライン脆弱性探索を行います；(3)モデルのアライメントを改善するための高品質な違反誘発ケースを生成します。従来の手法とは異なり、ASTRAは開発者が実際に尋ねる可能性のある現実的な入力に焦点を当て、オフラインの抽象化ガイドによるドメインモデリングとオンラインのドメイン知識グラフ適応を活用して、コーナーケースの脆弱性を浮き彫りにします。2つの主要な評価ドメインにおいて、ASTRAは既存の技術よりも11～66%多くの問題を発見し、17%効果的なアライメントトレーニングにつながるテストケースを生成し、より安全なAIシステムを構築するための実用的な価値を示しています。

English

AI coding assistants like GitHub Copilot are rapidly transforming software development, but their safety remains deeply uncertain-especially in high-stakes domains like cybersecurity. Current red-teaming tools often rely on fixed benchmarks or unrealistic prompts, missing many real-world vulnerabilities. We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in three stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs; and (3) it generates high-quality violation-inducing cases to improve model alignment. Unlike prior methods, ASTRA focuses on realistic inputs-requests that developers might actually ask-and uses both offline abstraction guided domain modeling and online domain knowledge graph adaptation to surface corner-case vulnerabilities. Across two major evaluation domains, ASTRA finds 11-66% more issues than existing techniques and produces test cases that lead to 17% more effective alignment training, showing its practical value for building safer AI systems.

ASTRA: AIソフトウェアアシスタントのための自律的時空間レッドチーミング

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

要旨

Support