マルチターン会話における視覚的記憶注入攻撃

要旨

生成型大規模視覚言語モデル（LVLM）は近年目覚ましい性能向上を達成し、そのユーザー基盤は急速に拡大している。しかし、特に長文脈・多対話ターン設定におけるLVLMの安全性は、ほとんど検討されていない。本論文では、攻撃者が改ざんされた画像をウェブやソーシャルメディアにアップロードする現実的なシナリオを考察する。善意のユーザーがこの画像をダウンロードし、LVLMへの入力として使用する。我々が提案する新しいステルス型視覚的メモリ注入（VMI）攻撃は、通常のプロンプトではLVLMが正常な挙動を示すように設計されているが、ユーザーがトリガープロンプトを与えると、LVLMが特定の指定されたターゲットメッセージを出力してユーザーを操作する（例：敵対的マーケティングや政治的説得）。単一ターン攻撃に焦点を当てた従来研究と比較して、VMIはユーザーとの長い多対話ターン会話後にも有効である。我々は、複数の最近のオープンウェイトLVLMに対して本攻撃を実証する。これにより、多対話ターン会話設定において、改変された画像を用いたユーザー大規模操作が可能であることを示し、LVLMのこれらの攻撃に対するロバスト性向上の必要性を提起する。ソースコードはhttps://github.com/chs20/visual-memory-injectionで公開している。

English

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

マルチターン会話における視覚的記憶注入攻撃

Visual Memory Injection Attacks for Multi-Turn Conversations

要旨

Support