DiffSeg30k:面向局部AIGC检测的多轮扩散编辑基准数据集
DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection
November 24, 2025
作者: Hai Ci, Ziheng Peng, Pei Yang, Yingxin Xuan, Mike Zheng Shou
cs.AI
摘要
基于扩散模型的图像编辑技术能够实现局部图像区域的自然修改,这使得AI生成内容的检测难度显著增加。现有AIGC检测基准主要关注整图分类,忽略了基于扩散编辑的局部定位能力。我们推出DiffSeg30k——一个包含3万张扩散编辑图像且具备像素级标注的公开数据集,旨在支持细粒度检测研究。该数据集具备四大特征:1)真实场景图像:从COCO数据集采集图像或图像提示词以反映现实世界内容多样性;2)多样化扩散模型:采用八种前沿扩散模型进行局部编辑;3)多轮次编辑:每张图像最多经历三次连续编辑以模拟实际串行编辑流程;4)逼真编辑场景:通过基于视觉语言模型的自动化流程识别语义区域,并生成涵盖添加、删除及属性修改的上下文感知提示词。DiffSeg30k将AIGC检测从二分类任务推进至语义分割层面,实现编辑区域的同步定位与编辑模型的联合识别。我们针对三种基线分割方法进行基准测试,揭示了语义分割任务面临的重大挑战,特别是对图像失真鲁棒性方面的不足。实验还发现,尽管分割模型接受的是像素级定位训练,却展现出卓越的整图分类能力,其检测性能超越传统伪造分类器,并在跨生成器泛化方面表现出巨大潜力。我们相信通过展示基于分割方法的优势与局限,DiffSeg30k将推动AI生成内容细粒度定位研究的发展。数据集已发布于:https://huggingface.co/datasets/Chaos2629/Diffseg30k
English
Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire images, overlooking the localization of diffusion-based edits. We introduce DiffSeg30k, a publicly available dataset of 30k diffusion-edited images with pixel-level annotations, designed to support fine-grained detection. DiffSeg30k features: 1) In-the-wild images--we collect images or image prompts from COCO to reflect real-world content diversity; 2) Diverse diffusion models--local edits using eight SOTA diffusion models; 3) Multi-turn editing--each image undergoes up to three sequential edits to mimic real-world sequential editing; and 4) Realistic editing scenarios--a vision-language model (VLM)-based pipeline automatically identifies meaningful regions and generates context-aware prompts covering additions, removals, and attribute changes. DiffSeg30k shifts AIGC detection from binary classification to semantic segmentation, enabling simultaneous localization of edits and identification of the editing models. We benchmark three baseline segmentation approaches, revealing significant challenges in semantic segmentation tasks, particularly concerning robustness to image distortions. Experiments also reveal that segmentation models, despite being trained for pixel-level localization, emerge as highly reliable whole-image classifiers of diffusion edits, outperforming established forgery classifiers while showing great potential in cross-generator generalization. We believe DiffSeg30k will advance research in fine-grained localization of AI-generated content by demonstrating the promise and limitations of segmentation-based methods. DiffSeg30k is released at: https://huggingface.co/datasets/Chaos2629/Diffseg30k