ChatPaper.aiChatPaper

SafePred:基于世界模型的计算机智能体预测性安全护栏

SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

February 2, 2026
作者: Yurun Chen, Zeyi Liao, Ping Yin, Taotao Xie, Keting Yin, Shengyu Zhang
cs.AI

摘要

随着计算机使用代理(CUA)在复杂现实环境中的广泛部署,普遍存在的长期风险往往会导致严重且不可逆的后果。现有CUA防护机制大多采用被动响应模式,仅能在当前观察空间内约束代理行为。这类防护机制虽能防范即时短期风险(如点击钓鱼链接),却无法主动规避长期风险:看似合理的行为可能引发延迟显现的高风险后果(如清理日志导致后续审计无法溯源),而被动防护机制在当前观察空间内无法识别此类风险。为突破这些局限,我们提出了一种预测性防护机制,其核心思想是将预测的未来风险与当前决策对齐。基于该方法,我们设计了SafePred框架——一种面向CUA的预测性防护体系,通过建立风险-决策闭环确保代理行为安全。SafePred具备两大核心能力:(1)短期与长期风险预测:以安全策略为风险预测基础,利用世界模型的预测能力生成短期与长期风险的语义表征,从而识别并剪枝导致高风险状态的行为;(2)决策优化:通过步进级干预与任务级重规划,将预测风险转化为可执行的安决策指导。大量实验表明,SafePred能显著减少高风险行为,在实现97.6%安全性能的同时,相较被动基线将任务效用提升最高达21.4%。
English
With the widespread deployment of Computer-using Agents (CUAs) in complex real-world environments, prevalent long-term risks often lead to severe and irreversible consequences. Most existing guardrails for CUAs adopt a reactive approach, constraining agent behavior only within the current observation space. While these guardrails can prevent immediate short-term risks (e.g., clicking on a phishing link), they cannot proactively avoid long-term risks: seemingly reasonable actions can lead to high-risk consequences that emerge with a delay (e.g., cleaning logs leads to future audits being untraceable), which reactive guardrails cannot identify within the current observation space. To address these limitations, we propose a predictive guardrail approach, with the core idea of aligning predicted future risks with current decisions. Based on this approach, we present SafePred, a predictive guardrail framework for CUAs that establishes a risk-to-decision loop to ensure safe agent behavior. SafePred supports two key abilities: (1) Short- and long-term risk prediction: by using safety policies as the basis for risk prediction, SafePred leverages the prediction capability of the world model to generate semantic representations of both short-term and long-term risks, thereby identifying and pruning actions that lead to high-risk states; (2) Decision optimization: translating predicted risks into actionable safe decision guidances through step-level interventions and task-level re-planning. Extensive experiments show that SafePred significantly reduces high-risk behaviors, achieving over 97.6% safety performance and improving task utility by up to 21.4% compared with reactive baselines.
PDF11February 12, 2026