Attacchi di Iniezione nella Memoria Visiva per Conversazioni a Turni Multipli

Abstract

I modelli generativi linguistico-visivi di grandi dimensioni (LVLM) hanno recentemente conseguito progressi prestazionali impressionanti e il loro bacino di utenti sta crescendo rapidamente. Tuttavia, la sicurezza degli LVLM, in particolare in contesti multi-turno a lungo contesto, rimane ampiamente inesplorata. In questo articolo, consideriamo lo scenario realistico in cui un attaccante carica un'immagine manipolata sul web o sui social media. Un utente benigno scarica questa immagine e la utilizza come input per l'LVLM. Il nostro innovativo attacco stealth di Iniezione della Memoria Visiva (VMI) è progettato in modo che, su prompt normali, l'LVLM mostri un comportamento nominale, ma una volta che l'utente fornisce un prompt scatenante, l'LVLM restituisca un specifico messaggio target predeterminato per manipolare l'utente, ad esempio per marketing avversariale o persuasione politica. Risposto a lavori precedenti focalizzati su attacchi a turno singolo, il VMI risulta efficace anche dopo una lunga conversazione multi-turno con l'utente. Dimostriamo il nostro attacco su diversi recenti LVLM open-weight. Questo articolo mostra pertanto che la manipolazione su larga scala degli utenti è fattibile mediante immagini perturbate in contesti di conversazione multi-turno, sollecitando una maggiore robustezza degli LVLM contro questi attacchi. Rilasciamo il codice sorgente all'indirizzo https://github.com/chs20/visual-memory-injection.

English

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

Attacchi di Iniezione nella Memoria Visiva per Conversazioni a Turni Multipli

Visual Memory Injection Attacks for Multi-Turn Conversations

Abstract

Support