POISE：面向LLM智能体的位置感知不可检测技能注入

摘要

智能体技能为扩展通用型智能体提供了一种轻量级机制，但其开放式格式使其面临技能中毒攻击的风险。一种实际具有危害性的注入必须保持隐形：若有效载荷的执行偏离用户合法任务，由此产生的失败信号会引发对技能的审查。因此，我们通过攻击成功率来评估攻击，这要求注入的有效载荷得以执行，且同一试验中用户任务仍能通过其验证器。此前的中毒攻击在此视角下面临可靠性-隐蔽性权衡：YAML头部注入虽能可靠加载，但易被审查；而更隐蔽的主体注入将显式恶意命令置于技能文本中，由于脱离上下文的命令易引发智能体自身的怀疑，故可靠性较低。我们提出POISE，一种基于位置的攻击方法，将触发器压缩为单一、外观良性的主体指令，将其置于可行位置，并利用上下文感知生成器将其与邻近的设置或先决步骤融合。在Skill-Inject基准上结合codex+gpt-5.2，POISE实现了89.3%的攻击成功率，较随机位置的主体基线高出28.0个百分点，较纯YAML基线高出2.6个百分点，同时保留了主体放置的隐蔽性优势。这种隐蔽性正是决定性优势所在：由于合法技能主体天然需要特权工具操作，LLM扫描器对此高度敏感，在四个评判器及两个基准上平均误报率高达74.6%的干净技能。POISE融入这些误报之中，仅有5.6%的中毒变体相对于其干净基线新增了高风险警报，使得当前的静态防御措施难以奏效。

English

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.