ChatPaper.aiChatPaper

LEGION:面向合成图像检测的基于学习的地面解释框架

LEGION: Learning to Ground and Explain for Synthetic Image Detection

March 19, 2025
作者: Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He
cs.AI

摘要

生成技术的快速发展已成为一把双刃剑。在提供强大工具以提升便利性的同时,也引发了显著的社会担忧。作为防御手段,现有的合成图像检测方法往往缺乏在伪影层面的文本可解释性,且过度集中于图像篡改检测,而现有数据集通常面临生成器过时及缺乏细粒度标注的问题。本文中,我们推出了SynthScars,一个高质量且多样化的数据集,包含12,236张全合成图像,并配有专家人工标注。该数据集涵盖4种不同的图像内容类型、3类伪影,以及细粒度标注,包括像素级分割、详细的文本解释和伪影类别标签。此外,我们提出了LEGION(学习定位与解释的合成图像检测框架),这是一个基于多模态大语言模型(MLLM)的图像伪造分析框架,集成了伪影检测、分割与解释功能。基于此能力,我们进一步探索将LEGION作为控制器,整合到图像优化流程中,以指导生成更高质量、更逼真的图像。大量实验表明,LEGION在多个基准测试中均优于现有方法,特别是在SynthScars数据集上,相较于次优的传统专家方法,mIoU提升了3.31%,F1分数提高了7.75%。此外,在其指导下生成的优化图像与人类偏好展现出更强的契合度。代码、模型及数据集将予以公开。
English
The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.

Summary

AI-Generated Summary

PDF212March 20, 2025