ChatPaper.aiChatPaper

LEGION:學習基礎與解釋的合成影像檢測方法

LEGION: Learning to Ground and Explain for Synthetic Image Detection

March 19, 2025
作者: Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He
cs.AI

摘要

生成技術的快速發展已成為一把雙刃劍。在提供強大工具以提升便利性的同時,也引發了重大的社會擔憂。作為防禦手段,當前的合成圖像檢測方法往往缺乏基於偽影的文本可解釋性,且過於專注於圖像篡改檢測,而現有的數據集通常存在生成器過時和缺乏細粒度註釋的問題。本文介紹了SynthScars,這是一個高質量且多樣化的數據集,包含12,236張完全合成的圖像,並配有人類專家的註釋。該數據集涵蓋4種不同的圖像內容類型、3類偽影,以及包括像素級分割、詳細文本解釋和偽影類別標籤在內的細粒度註釋。此外,我們提出了LEGION(基於多模態大語言模型的圖像偽造分析框架),它整合了偽影檢測、分割和解釋功能。基於這一能力,我們進一步探索了LEGION作為控制器的應用,將其集成到圖像精煉管道中,以指導生成更高質量、更逼真的圖像。大量實驗表明,LEGION在多個基準測試中均優於現有方法,特別是在SynthScars數據集上,其mIoU和F1分數分別比第二好的傳統專家高出3.31%和7.75%。此外,在其指導下生成的精煉圖像與人類偏好表現出更強的契合度。代碼、模型和數據集將被公開。
English
The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.

Summary

AI-Generated Summary

PDF212March 20, 2025