DeMamba：百万规模GenVideo基准下的AI生成视频检测

摘要

近年来，视频生成技术迅速发展。鉴于社交媒体平台上视频内容的普及，这些模型加剧了人们对虚假信息传播的担忧。因此，对于能够区分伪造的人工智能生成视频并减轻虚假信息可能带来的危害的检测器的需求与日俱增。然而，最先进的视频生成器缺乏大规模数据集，这对这类检测器的开发构成了障碍。为填补这一空白，我们引入了第一个人工智能生成视频检测数据集GenVideo。它具有以下特点：（1）包括超过一百万个人工智能生成和真实视频的大量视频；（2）生成内容和方法的丰富多样性，涵盖广泛的视频类别和生成技术。我们对数据集进行了广泛研究，并提出了两种针对真实世界场景量身定制的评估方法，以评估检测器的性能：跨生成器视频分类任务评估了训练检测器在生成器上的泛化能力；降质视频分类任务评估了检测器处理在传播过程中质量下降的视频的鲁棒性。此外，我们引入了一个即插即用的模块，名为Detail Mamba（DeMamba），旨在通过分析时间和空间维度的不一致性来增强检测器，从而识别人工智能生成视频。我们的广泛实验表明，与现有检测器相比，DeMamba在GenVideo上具有更好的泛化能力和鲁棒性。我们相信GenVideo数据集和DeMamba模块将显著推动人工智能生成视频检测领域的发展。我们的代码和数据集将在https://github.com/chenhaoxing/DeMamba 上提供。

English

Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale datasets from the most advanced video generators poses a barrier to the development of such detectors. To address this gap, we introduce the first AI-generated video detection dataset, GenVideo. It features the following characteristics: (1) a large volume of videos, including over one million AI-generated and real videos collected; (2) a rich diversity of generated content and methodologies, covering a broad spectrum of video categories and generation techniques. We conducted extensive studies of the dataset and proposed two evaluation methods tailored for real-world-like scenarios to assess the detectors' performance: the cross-generator video classification task assesses the generalizability of trained detectors on generators; the degraded video classification task evaluates the robustness of detectors to handle videos that have degraded in quality during dissemination. Moreover, we introduced a plug-and-play module, named Detail Mamba (DeMamba), designed to enhance the detectors by identifying AI-generated videos through the analysis of inconsistencies in temporal and spatial dimensions. Our extensive experiments demonstrate DeMamba's superior generalizability and robustness on GenVideo compared to existing detectors. We believe that the GenVideo dataset and the DeMamba module will significantly advance the field of AI-generated video detection. Our code and dataset will be aviliable at https://github.com/chenhaoxing/DeMamba.

DeMamba：百万规模GenVideo基准下的AI生成视频检测

DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

摘要

Support