ChatPaper.aiChatPaper

蜘蛛感應:基於分層自適應篩查的高效智能體防禦內在風險感知系統

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

February 5, 2026
作者: Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang
cs.AI

摘要

隨著大型語言模型(LLMs)逐步演進為自主智能體,其實際應用場景大幅擴展的同時也帶來了新的安全挑戰。現有的大多數智能體防禦機制採用強制檢查範式,即在智能體生命週期的預定義階段強制觸發安全驗證。本研究主張有效的智能體安全應具備內生性與選擇性,而非採用架構分離的強制性方案。我們提出Spider-Sense框架——基於內生風險感知(IRS)的事件驅動防禦框架,使智能體能保持潛在警戒狀態,僅在感知風險時觸發防禦機制。一旦觸發,Spider-Sense將啟動分層防禦機制,在效率與精準度間實現平衡:通過輕量級相似度匹配處理已知威脅模式,同時將模糊案例升級至深度內部推理,從而消除對外部模型的依賴。為建立嚴謹的評估體系,我們引入具備生命週期感知能力的基準測試S²Bench,其特色在於真實的工具執行環境與多階段攻擊場景。大量實驗表明,Spider-Sense在實現競爭性甚至更優防禦效能的同時,僅產生8.3%的微小延遲開銷,並達到最低的攻擊成功率(ASR)與誤報率(FPR)。
English
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S^2Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
PDF563February 7, 2026