POISE：位置感知的不可偵測技能注入於LLM代理

摘要

代理技能提供了一種輕量級機制來擴展通用代理，但其開放格式使其容易遭受技能投毒攻擊。實際上危險的注入必須保持隱形：若執行有效負載會干擾用戶的正常任務，則產生的失敗訊號會引發對技能的檢查。因此，我們透過攻擊成功率來評估攻擊，這要求注入的有效負載在相同試驗中執行，且用戶的任務仍能通過其驗證器。在此視角下，先前的技能投毒攻擊面臨可靠性與隱蔽性之間的權衡：YAML標頭注入能可靠地載入但易被檢查，而將明確惡意指令置於技能正文中的隱蔽性較高之正文注入則較不可靠，因為脫離上下文的指令會引起代理自身的懷疑。我們提出了POISE，一種位置感知攻擊，將觸發器壓縮為單一、看似無害的正文指令，將其放置在可行位置，並使用情境感知生成器使其與鄰近的設置或前置步驟融合。在採用codex+gpt-5.2的Skill-Inject上，POISE達到了89.3%的攻擊成功率，比隨機放置的正文基線高出28.0個百分點，比僅使用YAML的基線高出2.6個百分點，同時保留了正文放置的隱蔽優勢。此隱蔽性是決定性優勢：由於合法的技能正文自然需要特權工具操作，LLM掃描器過度敏感，在四個判斷器及兩個基準測試中平均誤報74.6%的乾淨技能。POISE融入這些誤報中，僅使5.6%的受毒化變體在乾淨基線之上新增高風險警報，使得現有的靜態防禦失效。

English

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.