IAG: Attacco Backdoor Sensibile all'Input su VLMs per il Grounding Visivo

Abstract

I modelli visione-linguaggio (VLMs) hanno mostrato progressi significativi in compiti come il grounding visivo, dove localizzano oggetti specifici nelle immagini basandosi su query in linguaggio naturale e immagini. Tuttavia, le problematiche di sicurezza nei compiti di grounding visivo per i VLMs rimangono poco esplorate, specialmente nel contesto degli attacchi backdoor. In questo articolo, introduciamo un nuovo metodo di attacco backdoor input-aware, IAG, progettato per manipolare il comportamento di grounding dei VLMs. Questo attacco costringe il modello a localizzare un oggetto target specifico nell'immagine di input, indipendentemente dalla query dell'utente. Proponiamo un generatore di trigger adattativo che incorpora le informazioni semantiche della descrizione del target dell'attacco nell'immagine originale utilizzando una U-Net condizionata dal testo, superando così la sfida dell'attacco open-vocabulary. Per garantire la furtività dell'attacco, utilizziamo una perdita di ricostruzione per minimizzare le discrepanze visive tra immagini avvelenate e immagini pulite. Inoltre, introduciamo un metodo unificato per generare dati di attacco. IAG viene valutato teoricamente ed empiricamente, dimostrandone la fattibilità e l'efficacia. In particolare, il nostro ASR@0.5 su InternVL-2.5-8B raggiunge oltre il 65\% su vari set di test. IAG mostra anche un potenziale promettente nel manipolare Ferret-7B e LlaVA-1.5-7B con un calo di accuratezza molto ridotto sui campioni puliti. Esperimenti specifici estesi, come lo studio di ablazione e le potenziali difese, indicano inoltre la robustezza e la trasferibilità del nostro attacco.

English

Vision-language models (VLMs) have shown significant advancements in tasks such as visual grounding, where they localize specific objects in images based on natural language queries and images. However, security issues in visual grounding tasks for VLMs remain underexplored, especially in the context of backdoor attacks. In this paper, we introduce a novel input-aware backdoor attack method, IAG, designed to manipulate the grounding behavior of VLMs. This attack forces the model to ground a specific target object in the input image, regardless of the user's query. We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image using a text-conditional U-Net, thereby overcoming the open-vocabulary attack challenge. To ensure the attack's stealthiness, we utilize a reconstruction loss to minimize visual discrepancies between poisoned and clean images. Additionally, we introduce a unified method for generating attack data. IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness. Notably, our ASR@0.5 on InternVL-2.5-8B reaches over 65\% on various testing sets. IAG also shows promising potential on manipulating Ferret-7B and LlaVA-1.5-7B with very little accuracy decrease on clean samples. Extensive specific experiments, such as ablation study and potential defense, also indicate the robustness and transferability of our attack.

IAG: Attacco Backdoor Sensibile all'Input su VLMs per il Grounding Visivo

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

Abstract

Support