POISE: LLM 에이전트에 대한 위치 인식 기반 탐지 불가능한 스킬 주입

초록

에이전트 스킬은 범용 에이전트를 확장하기 위한 가벼운 메커니즘을 제공하지만, 개방형 형식으로 인해 스킬 오염 공격에 노출된다. 실제로 위험한 주입은 눈에 띄지 않아야 한다. 페이로드가 사용자의 정상적인 작업을 이탈시키면, 그로 인한 실패 신호는 스킬에 대한 조사를 유발하기 때문이다. 따라서 우리는 공격 성공률(Attack Success Rate, ASR)로 공격을 평가한다. 이는 주입된 페이로드가 실행되어야 하며, 동일한 시행에서 사용자의 작업이 여전히 검증기를 통과해야 함을 요구한다. 이러한 관점에서 기존의 스킬 오염 공격은 신뢰성과 은밀성 사이의 트레이드오프에 직면한다. YAML 헤더 주입은 안정적으로 로드되지만 쉽게 검사되는 반면, 은밀성이 더 높은 본문 주입은 스킬 내용에 명시적인 악성 명령을 배치하는데, 이는 맥락에 맞지 않는 명령이 에이전트 자체의 의심을 불러일으키기 때문에 신뢰성이 떨어진다. 우리는 POISE(Position-Aware Injection Strategy)를 도입한다. 이는 트리거를 단일하고 평범해 보이는 본문 명령어로 압축하고, 실행 가능한 위치에 배치하며, 맥락 인식 생성기를 사용하여 주변의 설정 또는 전제 조건 단계와 혼합한다. Codex+gpt-5.2 기반의 Skill-Inject 데이터셋에서 POISE는 89.3%의 ASR을 달성하며, 무작위 배치 본문 기준선보다 28.0% 포인트, YAML 전용 기준선보다 2.6% 포인트 높은 성능을 보이면서도 본문 배치의 은밀성 이점을 유지한다. 이러한 은밀성이 결정적 차이다. 정상적인 스킬 본문은 자연히 권한이 필요한 도구 작업을 포함하기 때문에, LLM 스캐너는 과민하게 반응하여 두 벤치마크와 네 명의 평가자에 걸쳐 평균적으로 정상 스킬의 74.6%를 오탐지한다. 이러한 오탐지에 편승하여 POISE는 중독된 변종 중 오직 5.6%만이 정상 기준선 대비 새로운 고위험 경고를 발생시키므로, 현재의 정적 방어를 무력화한다.

English

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.