VAU-R1：強化学習による微調整を活用した映像異常理解の進展

要旨

ビデオ異常理解（VAU）は、スマートシティ、セキュリティ監視、災害警報システムなどのアプリケーションにおいて不可欠であるが、細粒度の時空間知覚と曖昧さに対する堅牢な推論能力を要求するため、依然として課題が多い。異常検出の進展にもかかわらず、既存の手法は解釈可能性に欠け、異常事象の因果関係や文脈を捉えることに苦戦している。この制約は、異常シナリオにおける推論能力を評価するための包括的なベンチマークの不在によってさらに悪化している。これらの課題に対処するため、我々はマルチモーダル大規模言語モデル（MLLMs）に基づくデータ効率的なフレームワーク「VAU-R1」を提案し、強化学習による微調整（RFT）を通じて異常推論を強化する。さらに、ビデオ異常推論に特化した初のChain-of-Thoughtベンチマーク「VAU-Bench」を提案する。これは、多肢選択式QA、詳細な根拠、時間的アノテーション、記述的なキャプションを特徴とする。実験結果は、VAU-R1が多様な文脈において質問応答の精度、時間的グラウンディング、推論の一貫性を大幅に向上させることを示している。我々の手法とベンチマークは、解釈可能で推論を意識したビデオ異常理解の強固な基盤を確立する。コードはhttps://github.com/GVCLab/VAU-R1で公開されている。

English

Video Anomaly Understanding (VAU) is essential for applications such as smart cities, security surveillance, and disaster alert systems, yet remains challenging due to its demand for fine-grained spatio-temporal perception and robust reasoning under ambiguity. Despite advances in anomaly detection, existing methods often lack interpretability and struggle to capture the causal and contextual aspects of abnormal events. This limitation is further compounded by the absence of comprehensive benchmarks for evaluating reasoning ability in anomaly scenarios. To address both challenges, we introduce VAU-R1, a data-efficient framework built upon Multimodal Large Language Models (MLLMs), which enhances anomaly reasoning through Reinforcement Fine-Tuning (RFT). Besides, we propose VAU-Bench, the first Chain-of-Thought benchmark tailored for video anomaly reasoning, featuring multiple-choice QA, detailed rationales, temporal annotations, and descriptive captions. Empirical results show that VAU-R1 significantly improves question answering accuracy, temporal grounding, and reasoning coherence across diverse contexts. Together, our method and benchmark establish a strong foundation for interpretable and reasoning-aware video anomaly understanding. Our code is available at https://github.com/GVCLab/VAU-R1.

VAU-R1：強化学習による微調整を活用した映像異常理解の進展

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

要旨

Support