ChatPaper.aiChatPaper

視覺語言模型時代的通用型異常檢測及更多:一項調查

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

July 31, 2024
作者: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa
cs.AI

摘要

檢測異分布(OOD)樣本對於確保機器學習系統的安全至關重要,並且已經塑造了異分布檢測領域。與此同時,還有幾個與異分布檢測密切相關的問題,包括異常檢測(AD)、新奇檢測(ND)、開放集識別(OSR)和異常值檢測(OD)。為了統一這些問題,提出了一個通用的異分布檢測框架,將這五個問題進行分類。然而,視覺語言模型(VLMs)如CLIP已經顯著改變了範式,並模糊了這些領域之間的界限,再次使研究人員感到困惑。在這份調查中,我們首先提出了一個通用的異分布檢測v2,概括了AD、ND、OSR、OOD檢測和OD在VLM時代的演變。我們的框架顯示,通過一些領域的不活躍和整合,具有挑戰性的問題已經變成了異分布檢測和AD。此外,我們還突出了定義、問題設置和基準的重大變化;因此,我們特色是對異分布檢測方法論的全面回顧,包括對其他相關任務的討論,以澄清它們與異分布檢測的關係。最後,我們探討了新興的大型視覺語言模型(LVLM)時代的進展,例如GPT-4V。我們以當前挑戰和未來方向結束本次調查。
English
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.

Summary

AI-Generated Summary

PDF62November 28, 2024