D^3QE：學習離散分佈差異感知量化誤差以檢測自回歸生成圖像

摘要

視覺自迴歸（AR）模型的出現，不僅革新了圖像生成領域，也為合成圖像檢測帶來了新的挑戰。與以往基於生成對抗網絡（GAN）或擴散模型的方法不同，AR模型通過離散令牌預測來生成圖像，在圖像合成質量上展現出顯著提升，並在其向量量化表示中呈現出獨特特性。本文提出利用離散分佈差異感知量化誤差（D^3QE）進行自迴歸生成圖像的檢測，該方法挖掘了真實與偽造圖像中存在的代碼本獨特模式及頻率分佈偏差。我們引入了一種離散分佈差異感知變壓器，將動態代碼本頻率統計整合至其注意力機制中，融合語義特徵與量化誤差潛在信息。為評估本方法，我們構建了一個名為ARForensics的綜合數據集，涵蓋了7種主流視覺AR模型。實驗結果表明，D^3QE在不同AR模型中均展現出優異的檢測準確率與強大的泛化能力，並對現實世界中的干擾具有魯棒性。相關代碼已公開於https://github.com/Zhangyr2022/D3QE。

English

The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D^3QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D^3QE across different AR models, with robustness to real-world perturbations. Code is available at https://github.com/Zhangyr2022/D3QE{https://github.com/Zhangyr2022/D3QE}.

D^3QE：學習離散分佈差異感知量化誤差以檢測自回歸生成圖像

D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

摘要

Support