ChatPaper.aiChatPaper

IVY-FAKE:一个统一的图像与视频AIGC检测可解释框架及基准

IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

June 1, 2025
作者: Wayne Zhang, Changjiang Jiang, Zhonghao Zhang, Chenyang Si, Fengchang Yu, Wei Peng
cs.AI

摘要

人工智能生成内容(AIGC)在视觉领域的快速发展,得益于如扩散模型等先进生成框架,已能创造出高度逼真的合成图像与视频。尽管这些突破带来了巨大机遇,但也引发了关于内容真实性与完整性的重大关切。当前多数AIGC检测方法采用黑箱式二元分类器,其可解释性有限,且缺乏统一框架同时支持图像与视频检测。这种双重限制削弱了模型的透明度,降低了可信度,并阻碍了实际应用部署。为应对这些挑战,我们推出了IVY-FAKE,一个专为可解释多模态AIGC检测设计的新颖、统一且大规模的数据集。不同于以往基准数据集在模态覆盖上的碎片化及标注稀疏的问题,IVY-FAKE包含超过15万个丰富标注的训练样本(图像与视频)及18,700个评估样本,每个样本均附有超越简单二元标签的详细自然语言推理。基于此,我们提出了Ivy可解释检测器(IVY-XDETECTOR),一种统一的AIGC检测与可解释架构,能够同时对图像与视频内容进行可解释检测。我们的统一视觉-语言模型在多个图像与视频检测基准测试中达到了最先进的性能,彰显了数据集与建模框架带来的显著进步。我们的数据已公开于https://huggingface.co/datasets/AI-Safeguard/Ivy-Fake。
English
The rapid advancement of Artificial Intelligence Generated Content (AIGC) in visual domains has resulted in highly realistic synthetic images and videos, driven by sophisticated generative frameworks such as diffusion-based architectures. While these breakthroughs open substantial opportunities, they simultaneously raise critical concerns about content authenticity and integrity. Many current AIGC detection methods operate as black-box binary classifiers, which offer limited interpretability, and no approach supports detecting both images and videos in a unified framework. This dual limitation compromises model transparency, reduces trustworthiness, and hinders practical deployment. To address these challenges, we introduce IVY-FAKE , a novel, unified, and large-scale dataset specifically designed for explainable multimodal AIGC detection. Unlike prior benchmarks, which suffer from fragmented modality coverage and sparse annotations, IVY-FAKE contains over 150,000 richly annotated training samples (images and videos) and 18,700 evaluation examples, each accompanied by detailed natural-language reasoning beyond simple binary labels. Building on this, we propose Ivy Explainable Detector (IVY-XDETECTOR), a unified AIGC detection and explainable architecture that jointly performs explainable detection for both image and video content. Our unified vision-language model achieves state-of-the-art performance across multiple image and video detection benchmarks, highlighting the significant advancements enabled by our dataset and modeling framework. Our data is publicly available at https://huggingface.co/datasets/AI-Safeguard/Ivy-Fake.
PDF133June 3, 2025