ChatPaper.aiChatPaper

会话风险记忆(SRM):确定性预执行安全闸门的时序授权机制

Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

March 22, 2026
作者: Florin Adrian Chitan
cs.AI

摘要

确定性预执行安全门控机制通过评估智能体单步动作与其分配角色的兼容性实现权限管控。尽管在单动作授权层面表现有效,此类系统在结构上无法识别分布式攻击——即那些将恶意意图分解为多个独立合规步骤的威胁。本文提出会话风险记忆模块(SRM),该轻量级确定性模块通过轨迹级授权机制扩展了无状态执行门控体系。SRM通过维护表征智能体会话行为演化的紧凑语义质心,并基于门控输出与基准值的差值进行指数移动平均来累积风险信号。该模块与底层门控系统共享语义向量表示,无需额外模型组件、训练过程或概率推断。我们在包含慢速数据渗出、渐进权限提升及合规性漂移等场景的80轮多回合基准测试中评估SRM性能。结果表明:ILION+SRM系统在保持100%检测率的同时,实现了F1=1.0000且误报率为0%的优异表现,而无状态ILION系统的F1值为0.9756且误报率达5%。关键的是,SRM在每回合计算开销低于250微秒的条件下消除了所有误报。该框架从概念上区分了空间授权一致性(按动作评估)与时间授权一致性(按轨迹评估),为智能体系统的会话级安全提供了理论依据。
English
Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions containing slow-burn exfiltration, gradual privilege escalation, and compliance drift scenarios. Results show that ILION+SRM achieves F1 = 1.0000 with 0% false positive rate, compared to stateless ILION at F1 = 0.9756 with 5% FPR, while maintaining 100% detection rate for both systems. Critically, SRM eliminates all false positives with a per-turn overhead under 250 microseconds. The framework introduces a conceptual distinction between spatial authorization consistency (evaluated per action) and temporal authorization consistency (evaluated over trajectory), providing a principled basis for session-level safety in agentic systems.
PDF11March 26, 2026