基于多模态语义扰动的视觉语言模型污染检测
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
November 5, 2025
作者: Jaden Park, Mu Cai, Feng Yao, Jingbo Shang, Soochahn Lee, Yong Jae Lee
cs.AI
摘要
视觉语言模型(VLM)的最新进展已在多项基准任务中实现顶尖性能。然而,互联网规模且常为专有的预训练语料库使用引发了从业者和用户共同关注的关键问题:因测试集泄露导致的性能虚高。尽管已有研究针对大型语言模型提出预训练数据净化、基准测试重构等缓解策略,但开发检测受污染VLM的互补性方向仍待探索。为填补这一空白,我们刻意对开源VLM在常用基准测试中进行污染实验,发现现有检测方法要么完全失效,要么表现不稳定。随后提出一种基于多模态语义扰动的新型检测方法,证明受污染模型在受控扰动下无法保持泛化能力。最后通过多种现实污染策略验证该方法的鲁棒性与有效性。相关代码及扰动数据集将公开释放。
English
Recent advances in Vision-Language Models (VLMs) have achieved
state-of-the-art performance on numerous benchmark tasks. However, the use of
internet-scale, often proprietary, pretraining corpora raises a critical
concern for both practitioners and users: inflated performance due to test-set
leakage. While prior works have proposed mitigation strategies such as
decontamination of pretraining data and benchmark redesign for LLMs, the
complementary direction of developing detection methods for contaminated VLMs
remains underexplored. To address this gap, we deliberately contaminate
open-source VLMs on popular benchmarks and show that existing detection
approaches either fail outright or exhibit inconsistent behavior. We then
propose a novel simple yet effective detection method based on multi-modal
semantic perturbation, demonstrating that contaminated models fail to
generalize under controlled perturbations. Finally, we validate our approach
across multiple realistic contamination strategies, confirming its robustness
and effectiveness. The code and perturbed dataset will be released publicly.