POISE: LLMエージェントに対する位置認識型の検出不可能なスキル注入

要旨

エージェントスキルは、汎用エージェントを拡張する軽量なメカニズムを提供するが、そのオープンな形式はスキルポイズニング攻撃にさらされる。実際に危険な注入は不可視でなければならない。ペイロードを実行することでユーザーの正当なタスクが破綻する場合、その結果生じる失敗シグナルはスキルの調査を招くためである。そこで我々は攻撃成功率（Attack Success Rate）によって攻撃を評価する。これは、注入されたペイロードが実行され、かつ同一試行においてユーザーのタスクがその検証器を通過することを要求する。従来のスキルポイズニング攻撃は、この観点において信頼性とステルス性のトレードオフに直面する。すなわち、YAMLヘッダー注入は確実に読み込まれるが検査が容易である一方、スキル本文に明示的な悪意あるコマンドを配置するよりステルス性の高い本文注入は、文脈外のコマンドがエージェント自身の疑念を招くため信頼性が低い。我々はPOISEを導入する。これは位置認識型攻撃であり、トリガーを単一の無害に見える本文命令に圧縮し、実行可能な位置に配置するとともに、文脈認識型生成器を用いて近傍のセットアップ手順や前提条件と融合させる。Skill-Injectベンチマークにおいてcodex+gpt-5.2を用いた場合、POISEは89.3%のASRを達成し、ランダム配置の本文ベースラインを28.0ポイント、YAMLのみのベースラインを2.6ポイント上回ると同時に、本文配置のステルス性の利点を維持する。このステルス性こそが決定的な差である。正規のスキル本文は本質的に特権ツール操作を必要とするため、LLMスキャナは過敏に反応し、4つの判定者と両ベンチマークにわたってクリーンスキルの平均74.6%に誤ったフラグを立てる。これらの誤警報に紛れることで、POISEはポイズニングされたバリアントのうちわずか5.6%しか、クリーンベースラインと比較して新たな高リスクアラートを発生させず、現在の静的防御を無効化する。

English

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.