LEGION：學習基礎與解釋的合成影像檢測方法

摘要

生成技術的快速發展已成為一把雙刃劍。在提供強大工具以提升便利性的同時，也引發了重大的社會擔憂。作為防禦手段，當前的合成圖像檢測方法往往缺乏基於偽影的文本可解釋性，且過於專注於圖像篡改檢測，而現有的數據集通常存在生成器過時和缺乏細粒度註釋的問題。本文介紹了SynthScars，這是一個高質量且多樣化的數據集，包含12,236張完全合成的圖像，並配有人類專家的註釋。該數據集涵蓋4種不同的圖像內容類型、3類偽影，以及包括像素級分割、詳細文本解釋和偽影類別標籤在內的細粒度註釋。此外，我們提出了LEGION（基於多模態大語言模型的圖像偽造分析框架），它整合了偽影檢測、分割和解釋功能。基於這一能力，我們進一步探索了LEGION作為控制器的應用，將其集成到圖像精煉管道中，以指導生成更高質量、更逼真的圖像。大量實驗表明，LEGION在多個基準測試中均優於現有方法，特別是在SynthScars數據集上，其mIoU和F1分數分別比第二好的傳統專家高出3.31%和7.75%。此外，在其指導下生成的精煉圖像與人類偏好表現出更強的契合度。代碼、模型和數據集將被公開。

English

The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.

LEGION：學習基礎與解釋的合成影像檢測方法

LEGION: Learning to Ground and Explain for Synthetic Image Detection

摘要

Support