ChatPaper.aiChatPaper

先遣SCOUT:面向提示注入防御的自适应检测器分配中的预先推理

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

May 29, 2026
作者: Shuhao Zhang, Jiarui Li, Qi Cao, Ruiyi Zhang, Pengtao Xie
cs.AI

摘要

提示注入检测器是异构的:每个检测器在不同攻击类型上各有优势,但没有一个始终可靠。然而,现有系统仍将检测视为固定的单检测器流水线,将每个请求交到某个检测器的盲区中。我们将防御重新定义为检测器分配:给定一个异构检测器池,针对每个请求决定运行哪些检测器,以及是否升级到LLM评判器。我们的框架SCOUT(可扩展且可控的结果预测,用于不确定性感知分流)通过预测每个检测器在类似历史输入上的样本级可靠性和延迟,使这一决策变得动态化,并向操作员暴露一个单一的安全-效用阈值(其中效用综合了良性通过率和实际耗时)。为评估这一场景,我们构建了SCOUT-450基准,该基准涵盖了旧版提示注入集所不足的、结构复杂的面向智能体的注入。在SCOUT-450上,与始终开启的GPT-4o评判器相比,一个面向安全的操作点将攻击成功率降低了46%,总实际耗时降低了40%,同时良性效用仅下降5.1个百分点。SCOUT还能迁移到三个外部基准(BIPIA、IPI和IHEval),改善了安全-效用前沿。
English
Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.