IAG：面向视觉定位的输入感知型后门攻击

摘要

视觉语言模型（VLMs）在视觉定位等任务中取得了显著进展，能够根据自然语言查询和图像定位特定对象。然而，针对VLMs在视觉定位任务中的安全问题，尤其是后门攻击方面，仍缺乏深入探索。本文提出了一种新颖的输入感知后门攻击方法IAG，旨在操控VLMs的定位行为。该攻击迫使模型在输入图像中定位特定目标对象，而忽略用户的查询。我们设计了一种自适应触发器生成器，利用文本条件U-Net将攻击目标描述的语义信息嵌入原始图像，从而克服开放词汇攻击的挑战。为确保攻击的隐蔽性，我们采用重建损失来最小化污染图像与干净图像之间的视觉差异。此外，我们提出了一种统一的攻击数据生成方法。IAG在理论和实验上均得到验证，证明了其可行性和有效性。值得注意的是，在InternVL-2.5-8B模型上，我们的ASR@0.5在多种测试集上均超过65%。IAG在操控Ferret-7B和LlaVA-1.5-7B模型时也展现出良好潜力，且对干净样本的准确率影响极小。广泛的专项实验，如消融研究和潜在防御，也表明了我们攻击的鲁棒性和可迁移性。

English

Vision-language models (VLMs) have shown significant advancements in tasks such as visual grounding, where they localize specific objects in images based on natural language queries and images. However, security issues in visual grounding tasks for VLMs remain underexplored, especially in the context of backdoor attacks. In this paper, we introduce a novel input-aware backdoor attack method, IAG, designed to manipulate the grounding behavior of VLMs. This attack forces the model to ground a specific target object in the input image, regardless of the user's query. We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image using a text-conditional U-Net, thereby overcoming the open-vocabulary attack challenge. To ensure the attack's stealthiness, we utilize a reconstruction loss to minimize visual discrepancies between poisoned and clean images. Additionally, we introduce a unified method for generating attack data. IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness. Notably, our ASR@0.5 on InternVL-2.5-8B reaches over 65\% on various testing sets. IAG also shows promising potential on manipulating Ferret-7B and LlaVA-1.5-7B with very little accuracy decrease on clean samples. Extensive specific experiments, such as ablation study and potential defense, also indicate the robustness and transferability of our attack.

IAG：面向视觉定位的输入感知型后门攻击

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

摘要

Support