VID-AD：视觉干扰下图像级逻辑异常检测数据集

摘要

在工业检测领域，由于视觉外观变化（如背景干扰、光照偏移和模糊）的影响，逻辑异常检测仍面临挑战。这些干扰因素往往会使视觉中心型检测器难以识别规则层面的违规行为。然而，现有基准数据集很少提供可控环境，即在保持逻辑状态不变的前提下系统调整干扰因素。为弥补这一空白，我们推出VID-AD数据集——专为视觉干扰下的逻辑异常检测而设计。该数据集包含10个制造场景和5种采集条件，共形成50个单分类任务及10,395张图像。每个场景通过从数量、长度、类型、位置和关系中选取的两项逻辑约束进行定义，异常类型包括单约束违反和组合约束违反。我们进一步提出基于语言的异常检测框架，该框架仅利用正常图像生成的文本描述进行训练。通过对比学习策略，结合正常文本描述与基于矛盾关系合成的负样本文本，我们的方法能学习到捕捉逻辑属性而非底层特征的嵌入表示。大量实验表明，该方法在所有评估设置中均较基线模型取得稳定提升。数据集地址：https://github.com/nkthiroto/VID-AD。

English

Logical anomaly detection in industrial inspection remains challenging due to variations in visual appearance (e.g., background clutter, illumination shift, and blur), which often distract vision-centric detectors from identifying rule-level violations. However, existing benchmarks rarely provide controlled settings where logical states are fixed while such nuisance factors vary. To address this gap, we introduce VID-AD, a dataset for logical anomaly detection under vision-induced distraction. It comprises 10 manufacturing scenarios and five capture conditions, totaling 50 one-class tasks and 10,395 images. Each scenario is defined by two logical constraints selected from quantity, length, type, placement, and relation, with anomalies including both single-constraint and combined violations. We further propose a language-based anomaly detection framework that relies solely on text descriptions generated from normal images. Using contrastive learning with positive texts and contradiction-based negative texts synthesized from these descriptions, our method learns embeddings that capture logical attributes rather than low-level features. Extensive experiments demonstrate consistent improvements over baselines across the evaluated settings. The dataset is available at: https://github.com/nkthiroto/VID-AD.

VID-AD：视觉干扰下图像级逻辑异常检测数据集

VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

摘要

Support