LEGION:學習基礎與解釋的合成影像檢測方法
LEGION: Learning to Ground and Explain for Synthetic Image Detection
March 19, 2025
作者: Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He
cs.AI
摘要
生成技術的快速發展已成為一把雙刃劍。在提供強大工具以提升便利性的同時,也引發了重大的社會擔憂。作為防禦手段,當前的合成圖像檢測方法往往缺乏基於偽影的文本可解釋性,且過於專注於圖像篡改檢測,而現有的數據集通常存在生成器過時和缺乏細粒度註釋的問題。本文介紹了SynthScars,這是一個高質量且多樣化的數據集,包含12,236張完全合成的圖像,並配有人類專家的註釋。該數據集涵蓋4種不同的圖像內容類型、3類偽影,以及包括像素級分割、詳細文本解釋和偽影類別標籤在內的細粒度註釋。此外,我們提出了LEGION(基於多模態大語言模型的圖像偽造分析框架),它整合了偽影檢測、分割和解釋功能。基於這一能力,我們進一步探索了LEGION作為控制器的應用,將其集成到圖像精煉管道中,以指導生成更高質量、更逼真的圖像。大量實驗表明,LEGION在多個基準測試中均優於現有方法,特別是在SynthScars數據集上,其mIoU和F1分數分別比第二好的傳統專家高出3.31%和7.75%。此外,在其指導下生成的精煉圖像與人類偏好表現出更強的契合度。代碼、模型和數據集將被公開。
English
The rapid advancements in generative technology have emerged as a
double-edged sword. While offering powerful tools that enhance convenience,
they also pose significant social concerns. As defenders, current synthetic
image detection methods often lack artifact-level textual interpretability and
are overly focused on image manipulation detection, and current datasets
usually suffer from outdated generators and a lack of fine-grained annotations.
In this paper, we introduce SynthScars, a high-quality and diverse dataset
consisting of 12,236 fully synthetic images with human-expert annotations. It
features 4 distinct image content types, 3 categories of artifacts, and
fine-grained annotations covering pixel-level segmentation, detailed textual
explanations, and artifact category labels. Furthermore, we propose LEGION
(LEarning to Ground and explain for Synthetic Image detectiON), a multimodal
large language model (MLLM)-based image forgery analysis framework that
integrates artifact detection, segmentation, and explanation. Building upon
this capability, we further explore LEGION as a controller, integrating it into
image refinement pipelines to guide the generation of higher-quality and more
realistic images. Extensive experiments show that LEGION outperforms existing
methods across multiple benchmarks, particularly surpassing the second-best
traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score.
Moreover, the refined images generated under its guidance exhibit stronger
alignment with human preferences. The code, model, and dataset will be
released.Summary
AI-Generated Summary