IVY-FAKE: 画像およびビデオAIGC検出のための統一説明可能フレームワークとベンチマーク

要旨

人工知能生成コンテンツ（AIGC）の視覚領域における急速な進展は、拡散ベースのアーキテクチャなどの高度な生成フレームワークによって、非常にリアルな合成画像や動画を生み出しています。これらのブレークスルーは大きな機会を提供する一方で、コンテンツの真正性と完全性に関する重要な懸念も引き起こしています。現在の多くのAIGC検出方法はブラックボックスの二値分類器として機能しており、解釈可能性が限られており、画像と動画の両方を統一されたフレームワークで検出するアプローチは存在しません。この二重の制約は、モデルの透明性を損ない、信頼性を低下させ、実用的な展開を妨げています。これらの課題に対処するため、我々は説明可能なマルチモーダルAIGC検出のために特別に設計された新規で統一された大規模データセットであるIVY-FAKEを紹介します。従来のベンチマークが断片的なモダリティカバレッジと疎なアノテーションに悩まされているのに対し、IVY-FAKEは15万以上の豊富にアノテーションされたトレーニングサンプル（画像と動画）と1万8700の評価例を含み、それぞれが単純な二値ラベルを超えた詳細な自然言語による推論を伴っています。これに基づいて、我々はIvy Explainable Detector（IVY-XDETECTOR）を提案します。これは、画像と動画コンテンツの両方に対して説明可能な検出を共同で行う統一されたAIGC検出および説明可能なアーキテクチャです。我々の統一された視覚言語モデルは、複数の画像および動画検出ベンチマークにおいて最先端の性能を達成し、我々のデータセットとモデリングフレームワークによって可能となった重要な進展を強調しています。我々のデータはhttps://huggingface.co/datasets/AI-Safeguard/Ivy-Fakeで公開されています。

English

The rapid advancement of Artificial Intelligence Generated Content (AIGC) in visual domains has resulted in highly realistic synthetic images and videos, driven by sophisticated generative frameworks such as diffusion-based architectures. While these breakthroughs open substantial opportunities, they simultaneously raise critical concerns about content authenticity and integrity. Many current AIGC detection methods operate as black-box binary classifiers, which offer limited interpretability, and no approach supports detecting both images and videos in a unified framework. This dual limitation compromises model transparency, reduces trustworthiness, and hinders practical deployment. To address these challenges, we introduce IVY-FAKE , a novel, unified, and large-scale dataset specifically designed for explainable multimodal AIGC detection. Unlike prior benchmarks, which suffer from fragmented modality coverage and sparse annotations, IVY-FAKE contains over 150,000 richly annotated training samples (images and videos) and 18,700 evaluation examples, each accompanied by detailed natural-language reasoning beyond simple binary labels. Building on this, we propose Ivy Explainable Detector (IVY-XDETECTOR), a unified AIGC detection and explainable architecture that jointly performs explainable detection for both image and video content. Our unified vision-language model achieves state-of-the-art performance across multiple image and video detection benchmarks, highlighting the significant advancements enabled by our dataset and modeling framework. Our data is publicly available at https://huggingface.co/datasets/AI-Safeguard/Ivy-Fake.

IVY-FAKE: 画像およびビデオAIGC検出のための統一説明可能フレームワークとベンチマーク

IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

要旨

Support