使用通用多提示進行越獄
Jailbreaking with Universal Multi-Prompts
February 3, 2025
作者: Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen
cs.AI
摘要
近年來,大型語言模型(LLMs)取得了快速發展,革新了各種應用,顯著提升了便利性和生產力。然而,除了它們令人印象深刻的能力之外,也出現了道德問題和新型攻擊,如越獄。儘管大多數提示技術專注於為個別案例優化對抗性輸入,但在處理大型數據集時會導致更高的計算成本。較少的研究涉及訓練通用攻擊者以轉移到未見任務的更一般設置。在本文中,我們介紹了JUMP,一種基於提示的方法,旨在使用通用多提示來越獄LLMs。我們還適應了我們的防禦方法,稱為DUMP。實驗結果表明,我們優化通用多提示的方法優於現有技術。
English
Large language models (LLMs) have seen rapid development in recent years,
revolutionizing various applications and significantly enhancing convenience
and productivity. However, alongside their impressive capabilities, ethical
concerns and new types of attacks, such as jailbreaking, have emerged. While
most prompting techniques focus on optimizing adversarial inputs for individual
cases, resulting in higher computational costs when dealing with large
datasets. Less research has addressed the more general setting of training a
universal attacker that can transfer to unseen tasks. In this paper, we
introduce JUMP, a prompt-based method designed to jailbreak LLMs using
universal multi-prompts. We also adapt our approach for defense, which we term
DUMP. Experimental results demonstrate that our method for optimizing universal
multi-prompts outperforms existing techniques.Summary
AI-Generated Summary