Visuele Geheugeninjectieaanvallen voor Meerronde Gesprekken

Samenvatting

Generatieve grote visueel-taalmodelen (LVLM's) hebben recent indrukwekkende prestatieverbeteringen bereikt, en hun gebruikersbasis groeit snel. De beveiliging van LVLM's, met name in een langdurige multi-turn context, is echter grotendeels onvoldoende onderzocht. In dit artikel beschouwen we het realistische scenario waarin een aanvaller een gemanipuleerde afbeelding uploadt naar het web/sociale media. Een goedaardige gebruiker downloadt deze afbeelding en gebruikt deze als invoer voor het LVLM. Onze nieuwe stille Visuele Geheugeninjectie (VMI) aanval is zo ontworpen dat het LVLM bij normale prompts nominaal gedrag vertoont, maar zodra de gebruiker een triggerende prompt geeft, produceert het LVLM een specifiek voorgeschreven doelbericht om de gebruiker te manipuleren, bijvoorbeeld voor adversariële marketing of politieke overreding. In vergelijking met eerder werk dat zich richtte op single-turn aanvallen, is VMI effectief zelfs na een lang multi-turn gesprek met de gebruiker. We demonstreren onze aanval op verschillende recente open-weight LVLM's. Dit artikel toont daarmee aan dat grootschalige manipulatie van gebruikers mogelijk is met verstoorde afbeeldingen in multi-turn gesprekssettings, wat pleit voor betere robuustheid van LVLM's tegen deze aanvallen. We geven de broncode vrij op https://github.com/chs20/visual-memory-injection.

English

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

Visuele Geheugeninjectieaanvallen voor Meerronde Gesprekken

Visual Memory Injection Attacks for Multi-Turn Conversations

Samenvatting

Support