IAG: 시각적 접지(Visual Grounding)를 위한 VLMs 대상 입력 인식 백도어 공격

초록

비전-언어 모델(VLMs)은 시각적 근거화(visual grounding)와 같은 작업에서 상당한 발전을 보여왔으며, 이는 자연어 질의와 이미지를 기반으로 특정 객체를 이미지 내에서 위치시키는 작업을 포함합니다. 그러나 VLMs의 시각적 근거화 작업에서의 보안 문제, 특히 백도어 공격과 관련된 문제는 아직 충분히 탐구되지 않았습니다. 본 논문에서는 VLMs의 근거화 행동을 조작하기 위해 설계된 새로운 입력 인식 백도어 공격 방법인 IAG를 소개합니다. 이 공격은 사용자의 질의와 무관하게 입력 이미지에서 특정 대상 객체를 근거화하도록 모델을 강제합니다. 우리는 텍스트 조건부 U-Net을 사용하여 공격 대상의 설명에 대한 의미 정보를 원본 이미지에 내장하는 적응형 트리거 생성기를 제안함으로써 개방형 어휘 공격 문제를 극복합니다. 공격의 은밀성을 보장하기 위해, 우리는 오염된 이미지와 깨끗한 이미지 간의 시각적 차이를 최소화하기 위해 재구성 손실을 활용합니다. 또한, 공격 데이터를 생성하기 위한 통합 방법을 제안합니다. IAG는 이론적 및 실증적으로 평가되어 그 타당성과 효과성을 입증합니다. 특히, InternVL-2.5-8B에서의 ASR@0.5는 다양한 테스트 세트에서 65% 이상을 달성합니다. IAG는 또한 Ferret-7B와 LlaVA-1.5-7B를 조작하는 데 있어서도 유망한 잠재력을 보이며, 깨끗한 샘플에서의 정확도 감소가 매우 적습니다. 다양한 특정 실험, 예를 들어 어블레이션 연구와 잠재적 방어 방법 등도 우리 공격의 견고성과 전이성을 나타냅니다.

English

Vision-language models (VLMs) have shown significant advancements in tasks such as visual grounding, where they localize specific objects in images based on natural language queries and images. However, security issues in visual grounding tasks for VLMs remain underexplored, especially in the context of backdoor attacks. In this paper, we introduce a novel input-aware backdoor attack method, IAG, designed to manipulate the grounding behavior of VLMs. This attack forces the model to ground a specific target object in the input image, regardless of the user's query. We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image using a text-conditional U-Net, thereby overcoming the open-vocabulary attack challenge. To ensure the attack's stealthiness, we utilize a reconstruction loss to minimize visual discrepancies between poisoned and clean images. Additionally, we introduce a unified method for generating attack data. IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness. Notably, our ASR@0.5 on InternVL-2.5-8B reaches over 65\% on various testing sets. IAG also shows promising potential on manipulating Ferret-7B and LlaVA-1.5-7B with very little accuracy decrease on clean samples. Extensive specific experiments, such as ablation study and potential defense, also indicate the robustness and transferability of our attack.

IAG: 시각적 접지(Visual Grounding)를 위한 VLMs 대상 입력 인식 백도어 공격

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

초록

Support