ChatPaper.aiChatPaper

DiffSeg30k:面向局部生成内容检测的多轮扩散编辑基准数据集

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

November 24, 2025
作者: Hai Ci, Ziheng Peng, Pei Yang, Yingxin Xuan, Mike Zheng Shou
cs.AI

摘要

基于扩散模型的图像编辑技术能够实现局部区域的逼真修改,这使得人工智能生成内容(AIGC)的检测难度显著增加。现有AIGC检测基准主要聚焦于整图分类,却忽视了针对扩散编辑的局部定位能力。我们推出DiffSeg30k——一个包含3万张具有像素级标注的扩散编辑图像的公开数据集,旨在支持细粒度检测研究。该数据集具备四大特征:1)真实场景图像:从COCO数据集收集图像及图像提示词以反映现实世界的内容多样性;2)多样化扩散模型:采用八种前沿扩散模型进行局部编辑;3)多轮次编辑:每张图像最多经历三次序列编辑以模拟真实场景的连续修改流程;4)逼真编辑场景:通过基于视觉语言模型(VLM)的流程自动识别语义区域,并生成涵盖添加、删除及属性修改的上下文感知提示词。DiffSeg30k将AIGC检测从二分类任务推进至语义分割层面,可同步实现编辑区域的定位与编辑模型的识别。我们针对三种基线分割方法进行基准测试,揭示了语义分割任务面临的重大挑战,尤其体现在对图像形变的鲁棒性方面。实验还表明,尽管分割模型接受的是像素级定位训练,却能成为高度可靠的扩散编辑全图分类器,其性能超越现有伪造分类器,并在跨生成器泛化方面展现出巨大潜力。我们相信通过展现基于分割方法的优势与局限,DiffSeg30k将推动AI生成内容细粒度定位研究的发展。数据集已发布于:https://huggingface.co/datasets/Chaos2629/Diffseg30k
English
Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire images, overlooking the localization of diffusion-based edits. We introduce DiffSeg30k, a publicly available dataset of 30k diffusion-edited images with pixel-level annotations, designed to support fine-grained detection. DiffSeg30k features: 1) In-the-wild images--we collect images or image prompts from COCO to reflect real-world content diversity; 2) Diverse diffusion models--local edits using eight SOTA diffusion models; 3) Multi-turn editing--each image undergoes up to three sequential edits to mimic real-world sequential editing; and 4) Realistic editing scenarios--a vision-language model (VLM)-based pipeline automatically identifies meaningful regions and generates context-aware prompts covering additions, removals, and attribute changes. DiffSeg30k shifts AIGC detection from binary classification to semantic segmentation, enabling simultaneous localization of edits and identification of the editing models. We benchmark three baseline segmentation approaches, revealing significant challenges in semantic segmentation tasks, particularly concerning robustness to image distortions. Experiments also reveal that segmentation models, despite being trained for pixel-level localization, emerge as highly reliable whole-image classifiers of diffusion edits, outperforming established forgery classifiers while showing great potential in cross-generator generalization. We believe DiffSeg30k will advance research in fine-grained localization of AI-generated content by demonstrating the promise and limitations of segmentation-based methods. DiffSeg30k is released at: https://huggingface.co/datasets/Chaos2629/Diffseg30k
PDF32December 1, 2025