会话风险记忆（SRM）：面向确定性预执行安全闸门的时序授权机制

摘要

确定性预执行安全门控评估个体智能体行为是否与其分配角色相符。尽管在单动作授权方面表现优异，这类系统在结构上无法识别将恶意意图分解为多个独立合规步骤的分布式攻击。本文提出会话风险记忆（SRM）——一种轻量级确定性模块，通过轨迹级授权扩展无状态执行门控机制。SRM维护表征智能体会话行为演化的紧凑语义质心，并通过对基线修正门控输出进行指数移动平均来累积风险信号。该模块与底层门控系统采用相同的语义向量表示，无需额外模型组件、训练或概率推断。我们在包含慢速渗透、渐进权限提升及合规性漂移场景的80轮多回合基准测试中评估SRM。结果表明：ILION+SRM实现F1=1.0000且误报率为0%，而无状态ILION的F1=0.9756且误报率达5%，同时两者均保持100%检测率。关键的是，SRM在每回合低于250微秒的开销下消除了所有误报。该框架提出了空间授权一致性（按动作评估）与时间授权一致性（按轨迹评估）的概念区分，为智能体系统会话级安全提供了理论基础。

English

Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions containing slow-burn exfiltration, gradual privilege escalation, and compliance drift scenarios. Results show that ILION+SRM achieves F1 = 1.0000 with 0% false positive rate, compared to stateless ILION at F1 = 0.9756 with 5% FPR, while maintaining 100% detection rate for both systems. Critically, SRM eliminates all false positives with a per-turn overhead under 250 microseconds. The framework introduces a conceptual distinction between spatial authorization consistency (evaluated per action) and temporal authorization consistency (evaluated over trajectory), providing a principled basis for session-level safety in agentic systems.