DeMamba：百万規模GenVideoベンチマークにおけるAI生成動画検出

要旨

近年、ビデオ生成技術は急速に進歩しています。ソーシャルメディアプラットフォームにおけるビデオコンテンツの人気を背景に、これらのモデルは偽情報の拡散に対する懸念を高めています。そのため、AI生成の偽ビデオと本物のビデオを区別し、偽情報による潜在的な被害を軽減できる検出器への需要が高まっています。しかし、最先端のビデオ生成器から得られる大規模なデータセットの不足が、そのような検出器の開発における障壁となっています。このギャップを埋めるため、我々は初のAI生成ビデオ検出データセットであるGenVideoを紹介します。GenVideoは以下の特徴を持っています：(1) 100万以上のAI生成ビデオと実ビデオを含む大規模なビデオデータ、(2) 生成されたコンテンツと手法の多様性で、幅広いビデオカテゴリと生成技術をカバーしています。我々はこのデータセットに対する広範な研究を行い、現実世界に近いシナリオに適した2つの評価方法を提案しました：クロスジェネレータビデオ分類タスクは、訓練された検出器のジェネレータに対する汎化性能を評価し、劣化ビデオ分類タスクは、伝播中に品質が劣化したビデオを処理する検出器の堅牢性を評価します。さらに、時空間次元における不整合を分析することでAI生成ビデオを識別するプラグアンドプレイモジュール、Detail Mamba (DeMamba) を導入しました。我々の広範な実験により、DeMambaがGenVideoにおいて既存の検出器と比較して優れた汎化性能と堅牢性を発揮することが示されました。GenVideoデータセットとDeMambaモジュールは、AI生成ビデオ検出の分野を大きく前進させると確信しています。我々のコードとデータセットは https://github.com/chenhaoxing/DeMamba で公開されます。

English

Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale datasets from the most advanced video generators poses a barrier to the development of such detectors. To address this gap, we introduce the first AI-generated video detection dataset, GenVideo. It features the following characteristics: (1) a large volume of videos, including over one million AI-generated and real videos collected; (2) a rich diversity of generated content and methodologies, covering a broad spectrum of video categories and generation techniques. We conducted extensive studies of the dataset and proposed two evaluation methods tailored for real-world-like scenarios to assess the detectors' performance: the cross-generator video classification task assesses the generalizability of trained detectors on generators; the degraded video classification task evaluates the robustness of detectors to handle videos that have degraded in quality during dissemination. Moreover, we introduced a plug-and-play module, named Detail Mamba (DeMamba), designed to enhance the detectors by identifying AI-generated videos through the analysis of inconsistencies in temporal and spatial dimensions. Our extensive experiments demonstrate DeMamba's superior generalizability and robustness on GenVideo compared to existing detectors. We believe that the GenVideo dataset and the DeMamba module will significantly advance the field of AI-generated video detection. Our code and dataset will be aviliable at https://github.com/chenhaoxing/DeMamba.

DeMamba：百万規模GenVideoベンチマークにおけるAI生成動画検出

DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

要旨

Support