识破伪造：基于大型多模态模型的合成图像检测与伪影解析

摘要

随着人工智能生成内容（AIGC）技术的飞速发展，合成图像在日常生活中日益普及，为真实性评估与检测带来了新的挑战。尽管现有方法在评估图像真实性和定位伪造方面效果显著，但这些方法往往缺乏人类可解释性，且未能充分应对合成数据日益增长的复杂性。为应对这些挑战，我们推出了FakeVLM，这是一款专为通用合成图像及DeepFake检测任务设计的大型多模态模型。FakeVLM不仅在区分真实与伪造图像方面表现出色，还能为图像伪影提供清晰、自然的语言解释，增强了可解释性。此外，我们发布了FakeClue，这是一个包含超过10万张图像的综合数据集，涵盖七个类别，并以自然语言标注了细粒度的伪影线索。FakeVLM在性能上可与专家模型相媲美，同时无需额外分类器，成为合成数据检测的强有力解决方案。跨多个数据集的广泛评估证实了FakeVLM在真实性分类和伪影解释任务中的优越性，为合成图像检测设立了新标杆。数据集与代码将发布于：https://github.com/opendatalab/FakeVLM。

English

With the rapid advancement of Artificial Intelligence Generated Content (AIGC) technologies, synthetic images have become increasingly prevalent in everyday life, posing new challenges for authenticity assessment and detection. Despite the effectiveness of existing methods in evaluating image authenticity and locating forgeries, these approaches often lack human interpretability and do not fully address the growing complexity of synthetic data. To tackle these challenges, we introduce FakeVLM, a specialized large multimodal model designed for both general synthetic image and DeepFake detection tasks. FakeVLM not only excels in distinguishing real from fake images but also provides clear, natural language explanations for image artifacts, enhancing interpretability. Additionally, we present FakeClue, a comprehensive dataset containing over 100,000 images across seven categories, annotated with fine-grained artifact clues in natural language. FakeVLM demonstrates performance comparable to expert models while eliminating the need for additional classifiers, making it a robust solution for synthetic data detection. Extensive evaluations across multiple datasets confirm the superiority of FakeVLM in both authenticity classification and artifact explanation tasks, setting a new benchmark for synthetic image detection. The dataset and code will be released in: https://github.com/opendatalab/FakeVLM.