ChatPaper.aiChatPaper

多模态代理的对抗攻击

Adversarial Attacks on Multimodal Agents

June 18, 2024
作者: Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan
cs.AI

摘要

现在,利用视觉的语言模型(VLMs)用于构建能够在真实环境中采取行动的自主多模态代理。在本文中,我们展示了多模态代理带来了新的安全风险,尽管攻击代理比以往更具挑战性,因为对环境的访问和了解有限。我们的攻击利用对抗性文本字符串来引导基于梯度的扰动,作用于环境中的一个触发图像:(1)我们的字幕攻击针对白盒字幕生成器,如果它们被用于将图像处理为字幕并作为额外输入提供给VLM;(2)我们的CLIP攻击同时攻击一组CLIP模型,这可能会转移到专有的VLM。为了评估这些攻击,我们精心策划了VisualWebArena-Adv,这是基于VisualWebArena的一组基于网络的多模态代理任务的对抗性任务集。在单个图像上的L-无穷范数为16/256时,字幕攻击可以使一个字幕增强的GPT-4V代理以75%的成功率执行对抗性目标。当我们移除字幕生成器或使用GPT-4V生成自己的字幕时,CLIP攻击的成功率分别为21%和43%。对基于其他VLMs的代理进行的实验,如Gemini-1.5、Claude-3和GPT-4o,显示了它们在鲁棒性上的有趣差异。进一步的分析揭示了几个影响攻击成功的关键因素,我们还讨论了对防御的影响。项目页面:https://chenwu.io/attack-agent 代码和数据:https://github.com/ChenWu98/agent-attack
English
Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-based perturbation over one trigger image in the environment: (1) our captioner attack attacks white-box captioners if they are used to process images into captions as additional inputs to the VLM; (2) our CLIP attack attacks a set of CLIP models jointly, which can transfer to proprietary VLMs. To evaluate the attacks, we curated VisualWebArena-Adv, a set of adversarial tasks based on VisualWebArena, an environment for web-based multimodal agent tasks. Within an L-infinity norm of 16/256 on a single image, the captioner attack can make a captioner-augmented GPT-4V agent execute the adversarial goals with a 75% success rate. When we remove the captioner or use GPT-4V to generate its own captions, the CLIP attack can achieve success rates of 21% and 43%, respectively. Experiments on agents based on other VLMs, such as Gemini-1.5, Claude-3, and GPT-4o, show interesting differences in their robustness. Further analysis reveals several key factors contributing to the attack's success, and we also discuss the implications for defenses as well. Project page: https://chenwu.io/attack-agent Code and data: https://github.com/ChenWu98/agent-attack

Summary

AI-Generated Summary

PDF41December 4, 2024