セッションリスクメモリ（SRM）：決定論的前実行安全ゲートのための時間的認可

要旨

決定論的前実行安全ゲートは、個々のエージェントアクションが割り当てられた役割と互換性があるかどうかを評価する。単一アクション単位の認可においては効果的であるが、これらのシステムは構造的に、有害な意図を複数の個別準拠ステップに分解する分散型攻撃を検知できない。本論文は、ステートレスな実行ゲートを軌道レベル認証に拡張する軽量な決定論的モジュールであるSession Risk Memory（SRM）を提案する。SRMは、エージェントセッションの行動プロファイルの変化を表すコンパクトなセマンティックセントロイドを維持し、ベースライン差し引き後のゲート出力に対する指数移動平均を通じてリスク信号を累積する。SRMは基盤となるゲートと同じセマンティックベクトル表現で動作し、追加のモデルコンポーネント、学習、または確率的推論を必要としない。我々は、低速流出、段階的特権昇格、コンプライアンス逸脱シナリオを含む80セッションのマルチターンベンチマークでSRMを評価した。その結果、ILION+SRMはF1=1.0000、偽陽性率0%を達成し、ステートレスILIONのF1=0.9756、FPR5%を上回り、両システムで100%の検出率を維持した。決定的に、SRMはターンあたり250マイクロ秒未満のオーバーヘッドで全ての偽陽性を排除する。本フレームワークは、空間的認可一貫性（アクション単位で評価）と時間的認可一貫性（軌道上で評価）の概念的な区別を導入し、エージェントシステムにおけるセッションレベル安全性の原則的な基盤を提供する。

English

Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions containing slow-burn exfiltration, gradual privilege escalation, and compliance drift scenarios. Results show that ILION+SRM achieves F1 = 1.0000 with 0% false positive rate, compared to stateless ILION at F1 = 0.9756 with 5% FPR, while maintaining 100% detection rate for both systems. Critically, SRM eliminates all false positives with a per-turn overhead under 250 microseconds. The framework introduces a conceptual distinction between spatial authorization consistency (evaluated per action) and temporal authorization consistency (evaluated over trajectory), providing a principled basis for session-level safety in agentic systems.

セッションリスクメモリ（SRM）：決定論的前実行安全ゲートのための時間的認可

Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

要旨

Support