MetaUAS:基於單一提示元學習的通用異常分割
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
May 14, 2025
作者: Bin-Bin Gao
cs.AI
摘要
零樣本和少樣本視覺異常分割依賴於強大的視覺-語言模型,這些模型通過手動設計的文本提示來檢測未見過的異常。然而,視覺表徵本質上與語言無關。本文探討了純視覺基礎模型作為廣泛使用的視覺-語言模型替代方案,用於通用視覺異常分割的潛力。我們提出了一種新範式,將異常分割統一為變化分割。這一範式使我們能夠利用從現有圖像數據集生成的大規模合成圖像對,這些圖像對具有對象級和局部區域的變化,且獨立於目標異常數據集。我們提出了一種用於通用異常分割的一提示元學習框架(MetaUAS),該框架在這一合成數據集上進行訓練,然後能夠很好地泛化以分割現實世界中的任何新穎或未見過的視覺異常。為處理提示圖像和查詢圖像之間的幾何變化,我們提出了一個軟特徵對齊模塊,該模塊橋接了成對圖像的變化感知和單圖像的語義分割。這是首次使用純視覺模型實現通用異常分割,而不依賴於特殊異常檢測數據集和預訓練的視覺-語言模型。我們的方法僅需一張正常圖像提示即可有效且高效地分割任何異常,並且無需語言指導即可實現無訓練。我們的MetaUAS顯著優於之前的零樣本、少樣本甚至全樣本異常分割方法。代碼和預訓練模型可在https://github.com/gaobb/MetaUAS獲取。
English
Zero- and few-shot visual anomaly segmentation relies on powerful
vision-language models that detect unseen anomalies using manually designed
textual prompts. However, visual representations are inherently independent of
language. In this paper, we explore the potential of a pure visual foundation
model as an alternative to widely used vision-language models for universal
visual anomaly segmentation. We present a novel paradigm that unifies anomaly
segmentation into change segmentation. This paradigm enables us to leverage
large-scale synthetic image pairs, featuring object-level and local region
changes, derived from existing image datasets, which are independent of target
anomaly datasets. We propose a one-prompt Meta-learning framework for Universal
Anomaly Segmentation (MetaUAS) that is trained on this synthetic dataset and
then generalizes well to segment any novel or unseen visual anomalies in the
real world. To handle geometrical variations between prompt and query images,
we propose a soft feature alignment module that bridges paired-image change
perception and single-image semantic segmentation. This is the first work to
achieve universal anomaly segmentation using a pure vision model without
relying on special anomaly detection datasets and pre-trained visual-language
models. Our method effectively and efficiently segments any anomalies with only
one normal image prompt and enjoys training-free without guidance from
language. Our MetaUAS significantly outperforms previous zero-shot, few-shot,
and even full-shot anomaly segmentation methods. The code and pre-trained
models are available at https://github.com/gaobb/MetaUAS.