基于多模态语义扰动的视觉语言模型污染检测

摘要

视觉语言模型（VLM）的最新进展已在众多基准任务中实现最先进的性能。然而，使用网络规模且常为专有的预训练语料库引发了从业者和用户共同关注的关键问题：因测试集泄露导致的性能虚高。尽管已有研究针对大型语言模型提出了预训练数据净化与基准重设计等缓解策略，但开发针对受污染视觉语言模型检测方法的互补方向仍探索不足。为填补这一空白，我们刻意对开源视觉语言模型在流行基准上进行污染实验，发现现有检测方法要么完全失效，要么表现出不一致的行为。随后我们提出一种基于多模态语义扰动的新型检测方法，该方法简洁而有效，证明受污染模型在受控扰动下无法保持泛化能力。最后，我们通过多种现实污染策略验证了该方法的鲁棒性和有效性。相关代码与扰动数据集将公开发布。

English

Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for contaminated VLMs remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on multi-modal semantic perturbation, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset will be released publicly.

基于多模态语义扰动的视觉语言模型污染检测

Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

摘要

Support