ChatPaper.aiChatPaper

深度伪造检测的尺度定律

Scaling Laws for Deepfake Detection

October 18, 2025
作者: Wenhao Wang, Longqi Cai, Taihong Xiao, Yuxiao Wang, Ming-Hsuan Yang
cs.AI

摘要

本文针对深度伪造检测任务中的尺度定律进行了系统性研究。具体而言,我们分析了模型性能与真实图像域数量、深度伪造生成方法及训练图像数量之间的关联。由于现有数据集均无法满足本研究对规模的要求,我们构建了该领域迄今最大的数据集ScaleDF,其中包含来自51个不同数据集(域)的580余万张真实图像,以及通过102种深度伪造方法生成的880余万张伪造图像。基于ScaleDF数据集,我们观察到与大型语言模型相似的能量律缩放现象:随着真实域数量或深度伪造方法的增加,平均检测误差会遵循可预测的能量律衰减规律。这一关键发现不仅使我们能预测达到目标性能所需增加的真实域或深度伪造方法数量,更启发我们以数据为中心的方式来应对不断演进的深度伪造技术。此外,我们还研究了缩放背景下预训练与数据增强在深度伪造检测中的作用,并探讨了缩放本身存在的局限性。
English
This paper presents a systematic study of scaling laws for the deepfake detection task. Specifically, we analyze the model performance against the number of real image domains, deepfake generation methods, and training images. Since no existing dataset meets the scale requirements for this research, we construct ScaleDF, the largest dataset to date in this field, which contains over 5.8 million real images from 51 different datasets (domains) and more than 8.8 million fake images generated by 102 deepfake methods. Using ScaleDF, we observe power-law scaling similar to that shown in large language models (LLMs). Specifically, the average detection error follows a predictable power-law decay as either the number of real domains or the number of deepfake methods increases. This key observation not only allows us to forecast the number of additional real domains or deepfake methods required to reach a target performance, but also inspires us to counter the evolving deepfake technology in a data-centric manner. Beyond this, we examine the role of pre-training and data augmentations in deepfake detection under scaling, as well as the limitations of scaling itself.
PDF31December 31, 2025