AgentSys：通过显式分层内存管理实现安全动态的LLM智能体系统

摘要

间接提示注入通过将恶意指令嵌入外部内容威胁大语言模型智能体，导致未授权操作和数据窃取。LLM智能体通过上下文窗口维持工作记忆，该窗口存储交互历史以支持决策。传统智能体 indiscriminately 将所有工具输出和推理痕迹累积于此内存，形成两大关键漏洞：(1)注入指令在工作流中持续存在，使攻击者获得多次操纵行为的机会；(2)冗余非必要内容会降低决策能力。现有防御方案将膨胀的内存视为既定事实，侧重于保持韧性，而非通过减少非必要积累来预防攻击。我们提出AgentSys框架，通过显式内存管理防御间接提示注入。受操作系统进程内存隔离机制启发，AgentSys采用分层架构：主智能体为工具调用生成工作智能体，每个工作智能体在隔离上下文中运行，并可生成嵌套子智能体处理子任务。外部数据和子任务痕迹永不进入主智能体内存，仅通过确定性JSON解析传输经模式验证的返回值。消融实验表明，单凭隔离机制即可将攻击成功率降至2.19%，结合验证器/清理器的动态事件触发检查（其开销随操作数而非上下文长度增长）能进一步提升防御效果。在AgentDojo和ASB基准测试中，AgentSys分别实现0.78%和4.25%的攻击成功率，同时在良性任务效用上较无防御基线略有提升。该框架对自适应攻击具备鲁棒性，且适用于多种基础模型，证明显式内存管理能实现安全动态的LLM智能体架构。代码已开源：https://github.com/ruoyaow/agentsys-memory。

English

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.

AgentSys：通过显式分层内存管理实现安全动态的LLM智能体系统

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

摘要

Support