保護視覺語言模型:減輕基於擾動攻擊中高斯噪聲的脆弱性
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
April 2, 2025
作者: Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yichen Fu, Yichun Feng, Kin-man Lam
cs.AI
摘要
視覺語言模型(VLMs)通過整合視覺信息擴展了大型語言模型(LLMs)的能力,但在處理噪聲或損壞圖像時仍易受越獄攻擊。儘管現有的VLMs在訓練過程中採取了安全措施來減輕此類攻擊,但與噪聲增強視覺輸入相關的漏洞卻被忽視了。在本研究中,我們發現缺乏噪聲增強訓練導致了關鍵的安全漏洞:許多VLMs甚至對簡單的擾動(如高斯噪聲)也表現出脆弱性。為應對這一挑戰,我們提出了Robust-VLGuard,這是一個包含對齊/非對齊圖像-文本對的多模態安全數據集,結合噪聲增強微調,在保持VLM功能的同時降低了攻擊成功率。針對更強的基於優化的視覺擾動攻擊,我們提出了DiffPure-VLM,利用擴散模型將對抗性擾動轉化為類似高斯的噪聲,從而可由經過噪聲增強安全微調的VLMs進行防禦。實驗結果表明,擴散模型的分佈轉移特性與我們微調後的VLMs高度契合,顯著減輕了不同強度下的對抗性擾動。數據集和代碼可在https://github.com/JarvisUSTC/DiffPure-RobustVLM獲取。
English
Vision-Language Models (VLMs) extend the capabilities of Large Language
Models (LLMs) by incorporating visual information, yet they remain vulnerable
to jailbreak attacks, especially when processing noisy or corrupted images.
Although existing VLMs adopt security measures during training to mitigate such
attacks, vulnerabilities associated with noise-augmented visual inputs are
overlooked. In this work, we identify that missing noise-augmented training
causes critical security gaps: many VLMs are susceptible to even simple
perturbations such as Gaussian noise. To address this challenge, we propose
Robust-VLGuard, a multimodal safety dataset with aligned / misaligned
image-text pairs, combined with noise-augmented fine-tuning that reduces attack
success rates while preserving functionality of VLM. For stronger
optimization-based visual perturbation attacks, we propose DiffPure-VLM,
leveraging diffusion models to convert adversarial perturbations into
Gaussian-like noise, which can be defended by VLMs with noise-augmented safety
fine-tuning. Experimental results demonstrate that the distribution-shifting
property of diffusion model aligns well with our fine-tuned VLMs, significantly
mitigating adversarial perturbations across varying intensities. The dataset
and code are available at https://github.com/JarvisUSTC/DiffPure-RobustVLM.Summary
AI-Generated Summary