IAG: ビジュアルグラウンディングのための視覚言語モデルに対する入力認識型バックドア攻撃

要旨

視覚言語モデル（VLM）は、自然言語クエリと画像に基づいて特定のオブジェクトを画像内で位置特定する視覚的グラウンディングなどのタスクにおいて、大きな進展を見せています。しかし、VLMの視覚的グラウンディングタスクにおけるセキュリティ問題、特にバックドア攻撃の文脈では、まだ十分に研究されていません。本論文では、VLMのグラウンディング動作を操作するための新しい入力依存型バックドア攻撃手法、IAGを提案します。この攻撃は、ユーザーのクエリに関わらず、入力画像内の特定のターゲットオブジェクトをモデルに位置特定させます。我々は、テキスト条件付きU-Netを使用して攻撃ターゲットの記述の意味情報を元の画像に埋め込む適応型トリガージェネレータを提案し、これによりオープン語彙攻撃の課題を克服します。攻撃のステルス性を確保するため、再構成損失を利用して、汚染された画像とクリーンな画像間の視覚的差異を最小化します。さらに、攻撃データを生成するための統一的な手法を導入します。IAGは理論的および実験的に評価され、その実現可能性と有効性が示されています。特に、InternVL-2.5-8BにおけるASR@0.5は、様々なテストセットで65％以上に達しています。IAGはまた、Ferret-7BやLlaVA-1.5-7Bを操作する上でも有望な可能性を示し、クリーンなサンプルでの精度低下が非常に少ないことが確認されています。アブレーションスタディや潜在的な防御策を含む広範な特定実験も、我々の攻撃の堅牢性と転移性を示しています。

English

Vision-language models (VLMs) have shown significant advancements in tasks such as visual grounding, where they localize specific objects in images based on natural language queries and images. However, security issues in visual grounding tasks for VLMs remain underexplored, especially in the context of backdoor attacks. In this paper, we introduce a novel input-aware backdoor attack method, IAG, designed to manipulate the grounding behavior of VLMs. This attack forces the model to ground a specific target object in the input image, regardless of the user's query. We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image using a text-conditional U-Net, thereby overcoming the open-vocabulary attack challenge. To ensure the attack's stealthiness, we utilize a reconstruction loss to minimize visual discrepancies between poisoned and clean images. Additionally, we introduce a unified method for generating attack data. IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness. Notably, our ASR@0.5 on InternVL-2.5-8B reaches over 65\% on various testing sets. IAG also shows promising potential on manipulating Ferret-7B and LlaVA-1.5-7B with very little accuracy decrease on clean samples. Extensive specific experiments, such as ablation study and potential defense, also indicate the robustness and transferability of our attack.

IAG: ビジュアルグラウンディングのための視覚言語モデルに対する入力認識型バックドア攻撃

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

要旨

Support