保障视觉-语言模型安全:降低基于扰动攻击中高斯噪声的脆弱性
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
April 2, 2025
作者: Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yichen Fu, Yichun Feng, Kin-man Lam
cs.AI
摘要
视觉-语言模型(VLMs)通过整合视觉信息扩展了大型语言模型(LLMs)的能力,但在处理噪声或损坏图像时,仍易受越狱攻击的影响。尽管现有VLMs在训练中采取了安全措施以减轻此类攻击,但与噪声增强视觉输入相关的漏洞却被忽视。本研究中,我们发现缺乏噪声增强训练导致了关键的安全漏洞:许多VLMs甚至对如高斯噪声这样的简单扰动也显得脆弱。为应对这一挑战,我们提出了Robust-VLGuard,一个包含对齐/未对齐图文对的多模态安全数据集,结合噪声增强微调,在保持VLM功能的同时降低了攻击成功率。针对更强的基于优化的视觉扰动攻击,我们提出了DiffPure-VLM,利用扩散模型将对抗性扰动转化为类似高斯噪声的形式,这种噪声可通过噪声增强安全微调的VLMs进行防御。实验结果表明,扩散模型的分布转移特性与我们微调后的VLMs高度契合,显著缓解了不同强度的对抗性扰动。数据集和代码可在https://github.com/JarvisUSTC/DiffPure-RobustVLM获取。
English
Vision-Language Models (VLMs) extend the capabilities of Large Language
Models (LLMs) by incorporating visual information, yet they remain vulnerable
to jailbreak attacks, especially when processing noisy or corrupted images.
Although existing VLMs adopt security measures during training to mitigate such
attacks, vulnerabilities associated with noise-augmented visual inputs are
overlooked. In this work, we identify that missing noise-augmented training
causes critical security gaps: many VLMs are susceptible to even simple
perturbations such as Gaussian noise. To address this challenge, we propose
Robust-VLGuard, a multimodal safety dataset with aligned / misaligned
image-text pairs, combined with noise-augmented fine-tuning that reduces attack
success rates while preserving functionality of VLM. For stronger
optimization-based visual perturbation attacks, we propose DiffPure-VLM,
leveraging diffusion models to convert adversarial perturbations into
Gaussian-like noise, which can be defended by VLMs with noise-augmented safety
fine-tuning. Experimental results demonstrate that the distribution-shifting
property of diffusion model aligns well with our fine-tuned VLMs, significantly
mitigating adversarial perturbations across varying intensities. The dataset
and code are available at https://github.com/JarvisUSTC/DiffPure-RobustVLM.Summary
AI-Generated Summary