D^3QE:學習離散分佈差異感知量化誤差以檢測自回歸生成圖像
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
October 7, 2025
作者: Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu
cs.AI
摘要
視覺自迴歸(AR)模型的出現,不僅革新了圖像生成領域,也為合成圖像檢測帶來了新的挑戰。與以往基於生成對抗網絡(GAN)或擴散模型的方法不同,AR模型通過離散令牌預測來生成圖像,在圖像合成質量上展現出顯著提升,並在其向量量化表示中呈現出獨特特性。本文提出利用離散分佈差異感知量化誤差(D^3QE)進行自迴歸生成圖像的檢測,該方法挖掘了真實與偽造圖像中存在的代碼本獨特模式及頻率分佈偏差。我們引入了一種離散分佈差異感知變壓器,將動態代碼本頻率統計整合至其注意力機制中,融合語義特徵與量化誤差潛在信息。為評估本方法,我們構建了一個名為ARForensics的綜合數據集,涵蓋了7種主流視覺AR模型。實驗結果表明,D^3QE在不同AR模型中均展現出優異的檢測準確率與強大的泛化能力,並對現實世界中的干擾具有魯棒性。相關代碼已公開於https://github.com/Zhangyr2022/D3QE。
English
The emergence of visual autoregressive (AR) models has revolutionized image
generation while presenting new challenges for synthetic image detection.
Unlike previous GAN or diffusion-based methods, AR models generate images
through discrete token prediction, exhibiting both marked improvements in image
synthesis quality and unique characteristics in their vector-quantized
representations. In this paper, we propose to leverage Discrete Distribution
Discrepancy-aware Quantization Error (D^3QE) for autoregressive-generated
image detection that exploits the distinctive patterns and the frequency
distribution bias of the codebook existing in real and fake images. We
introduce a discrete distribution discrepancy-aware transformer that integrates
dynamic codebook frequency statistics into its attention mechanism, fusing
semantic features and quantization error latent. To evaluate our method, we
construct a comprehensive dataset termed ARForensics covering 7 mainstream
visual AR models. Experiments demonstrate superior detection accuracy and
strong generalization of D^3QE across different AR models, with robustness to
real-world perturbations. Code is available at
https://github.com/Zhangyr2022/D3QE{https://github.com/Zhangyr2022/D3QE}.