ChatPaper.aiChatPaper

蜘蛛感应:基于分层自适应筛查的高效智能体防御内在风险感知

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

February 5, 2026
作者: Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang
cs.AI

摘要

随着大语言模型(LLM)向自主智能体演进,其现实应用场景显著扩展的同时也引发了新的安全挑战。现有智能体防御机制大多采用强制检查范式,即在智能体生命周期的预设节点强行触发安全验证。本文提出有效的智能体安全应具备内生性与选择性,而非架构分离的强制方案。我们设计出Spider-Sense框架——基于内生风险感知(IRS)的事件驱动型防御框架,使智能体保持潜在警觉状态,仅在实际感知风险时触发防御机制。一旦激活,该框架采用兼顾效率与精度的分级防御策略:通过轻量级相似度匹配快速处理已知威胁模式,同时将模糊案例升级至深度内部推理,从而消除对外部模型的依赖。为建立严谨评估体系,我们构建了S^2Bench基准测试平台,其具备生命周期感知能力,支持真实工具执行与多阶段攻击模拟。大量实验表明,Spider-Sense在保持仅8.3%延迟开销的同时,实现了最优或接近最优的防御性能,其攻击成功率(ASR)与误报率(FPR)均达最低水平。
English
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S^2Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
PDF584February 7, 2026