AdaptCLIP: ユニバーサル視覚異常検出のためのCLIP適応

要旨

ユニバーサル視覚異常検出は、追加のファインチューニングなしに、新しいまたは未見の視覚領域から異常を識別することを目指しており、オープンシナリオにおいて重要です。最近の研究では、CLIPのような事前学習済み視覚言語モデルが、わずかな正常画像だけで強力な汎化能力を示すことが実証されています。しかし、既存の手法はプロンプトテンプレートの設計、複雑なトークン間の相互作用、または追加のファインチューニングを必要とするため、柔軟性が制限されています。本研究では、2つの重要な洞察に基づいたシンプルでありながら効果的な方法であるAdaptCLIPを提案します。第一に、適応的な視覚的およびテキスト的表現は、共同でなく交互に学習されるべきです。第二に、クエリと正常画像プロンプト間の比較学習は、残差特徴に依存するだけでなく、文脈的および整列された残差特徴を組み込むべきです。AdaptCLIPは、CLIPモデルを基盤サービスとして扱い、その入力または出力端に視覚アダプタ、テキストアダプタ、およびプロンプト-クエリアダプタの3つのシンプルなアダプタを追加します。AdaptCLIPは、ドメイン間でのゼロショット/少数ショットの汎化をサポートし、ベースデータセットで一度訓練されると、ターゲットドメインでの訓練不要の特性を持ちます。AdaptCLIPは、産業および医療領域の12の異常検出ベンチマークで最先端の性能を達成し、既存の競合手法を大幅に上回ります。AdaptCLIPのコードとモデルは、https://github.com/gaobb/AdaptCLIP で公開する予定です。

English

Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token interactions, or requiring additional fine-tuning, resulting in limited flexibility. In this work, we present a simple yet effective method called AdaptCLIP based on two key insights. First, adaptive visual and textual representations should be learned alternately rather than jointly. Second, comparative learning between query and normal image prompt should incorporate both contextual and aligned residual features, rather than relying solely on residual features. AdaptCLIP treats CLIP models as a foundational service, adding only three simple adapters, visual adapter, textual adapter, and prompt-query adapter, at its input or output ends. AdaptCLIP supports zero-/few-shot generalization across domains and possesses a training-free manner on target domains once trained on a base dataset. AdaptCLIP achieves state-of-the-art performance on 12 anomaly detection benchmarks from industrial and medical domains, significantly outperforming existing competitive methods. We will make the code and model of AdaptCLIP available at https://github.com/gaobb/AdaptCLIP.

AdaptCLIP: ユニバーサル視覚異常検出のためのCLIP適応

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

要旨

Summary

Support

Support