多轮对话中的视觉记忆注入攻击

摘要

生成式大型视觉语言模型（LVLM）近期取得显著性能突破，用户规模持续快速增长。然而在长上下文多轮对话场景下的模型安全性研究仍存空白。本文研究一种现实攻击场景：攻击者将篡改图像上传至网络/社交媒体，良性用户下载该图像并作为LVLM的输入。我们提出的新型隐蔽视觉记忆注入（VMI）攻击可使模型在正常指令下表现正常，但当用户输入触发指令时，LVLM会输出预设目标信息以操纵用户（如用于恶意营销或政治宣传）。相较于以往聚焦单轮攻击的研究，VMI在用户进行多轮长对话后依然有效。我们在多个最新开源LVLM上验证了该攻击的有效性，由此证明通过篡改图像在多轮对话中实现大规模用户操纵具有可行性，亟需提升LVLM对此类攻击的鲁棒性。相关源代码已发布于https://github.com/chs20/visual-memory-injection。

English

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection