D^3QE：学习离散分布差异感知的量化误差用于自回归生成图像检测

摘要

视觉自回归（AR）模型的出现彻底革新了图像生成领域，同时也为合成图像检测带来了新的挑战。与以往的GAN或基于扩散的方法不同，AR模型通过离散令牌预测生成图像，在图像合成质量上展现出显著提升，并在其向量量化表示中呈现出独特特征。本文提出利用离散分布差异感知的量化误差（D^3QE）进行自回归生成图像的检测，该方法挖掘了真实与伪造图像中存在的独特模式及码本频率分布偏差。我们引入了一种离散分布差异感知的Transformer，将动态码本频率统计融入其注意力机制中，融合语义特征与量化误差潜在信息。为评估该方法，我们构建了一个名为ARForensics的综合数据集，涵盖7种主流视觉AR模型。实验表明，D^3QE在不同AR模型间展现出卓越的检测精度和强大的泛化能力，并对现实世界中的扰动具有鲁棒性。代码已发布于https://github.com/Zhangyr2022/D3QE{https://github.com/Zhangyr2022/D3QE}。

English

The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D^3QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D^3QE across different AR models, with robustness to real-world perturbations. Code is available at https://github.com/Zhangyr2022/D3QE{https://github.com/Zhangyr2022/D3QE}.

D^3QE：学习离散分布差异感知的量化误差用于自回归生成图像检测

D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

摘要

Support