AnyAnomaly: LVLM 기반 제로샷 맞춤형 비디오 이상 탐지

초록

비디오 이상 탐지(Video Anomaly Detection, VAD)는 컴퓨터 비전에서 비디오 분석 및 감시에 있어 매우 중요한 기술입니다. 그러나 기존의 VAD 모델들은 학습된 정상 패턴에 의존하기 때문에 다양한 환경에 적용하기 어렵습니다. 이로 인해 사용자들은 새로운 환경에 맞춰 모델을 재학습시키거나 별도의 AI 모델을 개발해야 하며, 이는 머신러닝 전문 지식, 고성능 하드웨어, 그리고 방대한 데이터 수집을 필요로 하여 VAD의 실용성을 제한합니다. 이러한 문제를 해결하기 위해, 본 연구에서는 사용자 정의 비디오 이상 탐지(Customizable Video Anomaly Detection, C-VAD) 기술과 AnyAnomaly 모델을 제안합니다. C-VAD는 사용자가 정의한 텍스트를 이상 이벤트로 간주하고, 비디오에서 지정된 이벤트가 포함된 프레임을 탐지합니다. 우리는 대규모 비전 언어 모델을 미세 조정하지 않고도 컨텍스트 인식 시각 질의 응답(Context-Aware Visual Question Answering)을 통해 AnyAnomaly를 효과적으로 구현했습니다. 제안된 모델의 효과를 검증하기 위해, C-VAD 데이터셋을 구축하고 AnyAnomaly의 우수성을 입증했습니다. 또한, 우리의 접근 방식은 VAD 벤치마크 데이터셋에서도 경쟁력 있는 성능을 보였으며, UBnormal 데이터셋에서 최첨단 결과를 달성하고 모든 데이터셋에서 일반화 성능에서 다른 방법들을 능가했습니다. 우리의 코드는 github.com/SkiddieAhn/Paper-AnyAnomaly에서 확인할 수 있습니다.

English

Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive performance on VAD benchmark datasets, achieving state-of-the-art results on the UBnormal dataset and outperforming other methods in generalization across all datasets. Our code is available online at github.com/SkiddieAhn/Paper-AnyAnomaly.

AnyAnomaly: LVLM 기반 제로샷 맞춤형 비디오 이상 탐지

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

초록

Support