ChatPaper.aiChatPaper

AnomalyVFM:将视觉基础模型转化为零样本异常检测器

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

April 9, 2026
作者: Matic Fučka, Vitjan Zavrtanik, Danijel Skočaj
cs.AI

摘要

零样本异常检测旨在无需任何域内训练图像的情况下,检测并定位图像中的异常区域。尽管现有方法利用视觉语言模型(如CLIP)迁移高层概念知识,但基于纯视觉基础模型(如DINOv2)的方法在性能上始终落后。我们认为这一差距源于两个实际问题:(一)现有辅助异常检测数据集多样性不足;(二)视觉基础模型的适应策略过于浅层。为应对这两大挑战,我们提出AnomalyVFM框架——通过结合稳健的三阶段合成数据集生成方案与参数高效的适应机制,将任何预训练的视觉基础模型转化为强大的零样本异常检测器。该框架利用低秩特征适配器和置信度加权的像素损失,使现代视觉基础模型显著超越当前最优方法。具体而言,以RADIO为骨干网络时,AnomalyVFM在9个多样化数据集上实现了94.1%的平均图像级AUROC,较先前方法提升3.3个百分点。项目页面:https://maticfuc.github.io/anomaly_vfm/
English
Zero-shot anomaly detection aims to detect and localise abnormal regions in the image without access to any in-domain training images. While recent approaches leverage vision-language models (VLMs), such as CLIP, to transfer high-level concept knowledge, methods based on purely vision foundation models (VFMs), like DINOv2, have lagged behind in performance. We argue that this gap stems from two practical issues: (i) limited diversity in existing auxiliary anomaly detection datasets and (ii) overly shallow VFM adaptation strategies. To address both challenges, we propose AnomalyVFM, a general and effective framework that turns any pretrained VFM into a strong zero-shot anomaly detector. Our approach combines a robust three-stage synthetic dataset generation scheme with a parameter-efficient adaptation mechanism, utilising low-rank feature adapters and a confidence-weighted pixel loss. Together, these components enable modern VFMs to substantially outperform current state-of-the-art methods. More specifically, with RADIO as a backbone, AnomalyVFM achieves an average image-level AUROC of 94.1% across 9 diverse datasets, surpassing previous methods by significant 3.3 percentage points. Project Page: https://maticfuc.github.io/anomaly_vfm/
PDF21April 11, 2026