VID-AD：视觉干扰下图像级逻辑异常检测数据集

摘要

工业检测中的逻辑异常检测因视觉外观变化（如背景杂乱、光照偏移和模糊）仍面临挑战，这些干扰常使视觉中心检测器难以识别规则层面的违规。然而现有基准数据集很少提供逻辑状态固定而干扰因素可控的实验设置。为填补这一空白，我们推出VID-AD数据集，用于研究视觉干扰下的逻辑异常检测。该数据集包含10个制造场景与5种采集条件，共构成50个单分类任务和10,395张图像。每个场景通过从数量、长度、类型、位置和关系中选取的两项逻辑约束来定义，异常类型包括单约束违反与组合违反。我们进一步提出基于语言的异常检测框架，仅利用正常图像生成的文本描述进行训练。通过正样本文本与基于矛盾合成的负样本文本进行对比学习，该方法能学习捕捉逻辑属性而非底层特征的嵌入表示。大量实验表明，在评估设置中该方法相对基线模型取得了一致性提升。数据集地址：https://github.com/nkthiroto/VID-AD。

English

Logical anomaly detection in industrial inspection remains challenging due to variations in visual appearance (e.g., background clutter, illumination shift, and blur), which often distract vision-centric detectors from identifying rule-level violations. However, existing benchmarks rarely provide controlled settings where logical states are fixed while such nuisance factors vary. To address this gap, we introduce VID-AD, a dataset for logical anomaly detection under vision-induced distraction. It comprises 10 manufacturing scenarios and five capture conditions, totaling 50 one-class tasks and 10,395 images. Each scenario is defined by two logical constraints selected from quantity, length, type, placement, and relation, with anomalies including both single-constraint and combined violations. We further propose a language-based anomaly detection framework that relies solely on text descriptions generated from normal images. Using contrastive learning with positive texts and contradiction-based negative texts synthesized from these descriptions, our method learns embeddings that capture logical attributes rather than low-level features. Extensive experiments demonstrate consistent improvements over baselines across the evaluated settings. The dataset is available at: https://github.com/nkthiroto/VID-AD.