ChatPaper.aiChatPaper

ChatInject:濫用聊天模板進行LLM代理中的提示注入

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

September 26, 2025
作者: Hwan Chang, Yonghyun Jun, Hwanhee Lee
cs.AI

摘要

隨著基於大型語言模型(LLM)的代理在外部環境中的廣泛部署,新的攻擊面也隨之出現,為敵對操縱提供了機會。其中一個主要威脅是間接提示注入,攻擊者將惡意指令嵌入外部環境的輸出中,導致代理將其解釋並執行,彷彿這些指令是合法的提示。雖然先前的研究主要集中在純文本注入攻擊上,但我們發現了一個重要卻未被充分探索的脆弱性:LLM對結構化聊天模板的依賴性,以及其易受多輪對話中說服性上下文操縱的影響。為此,我們提出了ChatInject,這是一種將惡意負載格式化以模仿原生聊天模板的攻擊方式,從而利用模型固有的指令遵循傾向。在此基礎上,我們開發了一種基於說服的多輪變體,通過多輪對話引導代理接受並執行原本可疑的操作。通過對前沿LLM的全面實驗,我們展示了三個關鍵發現:(1)ChatInject的平均攻擊成功率顯著高於傳統的提示注入方法,在AgentDojo上從5.18%提升至32.05%,在InjecAgent上從15.13%提升至45.90%,其中多輪對話在InjecAgent上的平均成功率達到52.33%,表現尤為突出;(2)基於聊天模板的負載在模型間具有強遷移性,即使面對未知模板結構的閉源LLM,仍能保持有效性;(3)現有的基於提示的防禦措施對這種攻擊方式,尤其是多輪變體,大多無效。這些發現揭示了當前代理系統中的脆弱性。
English
The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs' dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model's inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.
PDF42September 30, 2025