ビジョン言語モデル時代における一般化された分布外検出とその先：サーベイ

要旨

分布外（OOD）サンプルの検出は、機械学習システムの安全性を確保する上で極めて重要であり、OOD検出の分野を形作ってきました。一方で、異常検出（AD）、新規性検出（ND）、オープンセット認識（OSR）、外れ値検出（OD）など、OOD検出と密接に関連するいくつかの問題も存在します。これらの問題を統合するため、これら5つの問題を分類学的に整理した一般化されたOOD検出フレームワークが提案されました。しかし、CLIPのような視覚言語モデル（VLM）がパラダイムを大きく変え、これらの分野の境界を曖昧にし、再び研究者を混乱させています。本調査では、まずVLM時代におけるAD、ND、OSR、OOD検出、ODの進化を包括した一般化されたOOD検出v2を提示します。我々のフレームワークは、いくつかの分野の活動停止と統合を経て、OOD検出とADが主要な課題となっていることを明らかにします。さらに、定義、問題設定、ベンチマークの大幅な変化も強調し、OOD検出の方法論に関する包括的なレビューを特徴とし、他の関連タスクとの関係を明確にするための議論も行います。最後に、GPT-4Vのような大規模視覚言語モデル（LVLM）時代の進展を探ります。本調査は、未解決の課題と今後の方向性で締めくくります。

English

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.

ビジョン言語モデル時代における一般化された分布外検出とその先：サーベイ

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

要旨

Support