多轮对话中的视觉记忆注入攻击

摘要

生成式大型视觉语言模型（LVLM）近期取得了显著性能突破，用户规模正迅速扩大。然而在长上下文多轮对话场景下的安全性研究仍存在明显空白。本文研究了一种现实攻击场景：攻击者将篡改图像上传至网络/社交媒体，良性用户下载该图像并作为LVLM的输入。我们提出的新型隐蔽视觉记忆注入（VMI）攻击具有以下特性：在正常提示下LVLM表现正常，但当用户给出触发式提示时，模型会输出特定预设目标信息以操纵用户（例如用于广告营销或政治宣传）。相较于以往聚焦单轮攻击的研究，VMI攻击在用户进行长周期多轮对话后依然有效。我们在多个最新开源LVLM上验证了该攻击的有效性。本文由此证明：通过篡改图像在多轮对话场景中实现大规模用户操纵具有可行性，这要求LVLM需提升对此类攻击的鲁棒性。项目源码已发布于https://github.com/chs20/visual-memory-injection。

English

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

多轮对话中的视觉记忆注入攻击

Visual Memory Injection Attacks for Multi-Turn Conversations

摘要

Support