ChatPaper.aiChatPaper

快速适应新型语音欺骗:分布偏移下合成语音的少样本检测

Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts

August 18, 2025
作者: Ashi Garg, Zexin Cai, Henry Li Xinyuan, Leibny Paola García-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews
cs.AI

摘要

我們致力於解決在分佈偏移情況下檢測合成語音的挑戰——這些偏移源於訓練數據中未見過的合成方法、說話者、語言或音頻條件。少樣本學習方法通過基於少量同分佈樣本快速適應,是應對分佈偏移的一種有前景的途徑。我們提出了一種自注意力原型網絡,以實現更為穩健的少樣本適應。為評估我們的方法,我們系統地比較了傳統零樣本檢測器與所提出的少樣本檢測器的性能,並精心控制訓練條件以在評估時引入分佈偏移。在分佈偏移影響零樣本性能的條件下,我們提出的少樣本適應技術能夠利用少至10個同分佈樣本快速適應——在日語深度偽造數據上實現了高達32%的相對等錯誤率(EER)降低,在ASVspoof 2021深度偽造數據集上實現了20%的相對降低。
English
We address the challenge of detecting synthesized speech under distribution shifts -- arising from unseen synthesis methods, speakers, languages, or audio conditions -- relative to the training data. Few-shot learning methods are a promising way to tackle distribution shifts by rapidly adapting on the basis of a few in-distribution samples. We propose a self-attentive prototypical network to enable more robust few-shot adaptation. To evaluate our approach, we systematically compare the performance of traditional zero-shot detectors and the proposed few-shot detectors, carefully controlling training conditions to introduce distribution shifts at evaluation time. In conditions where distribution shifts hamper the zero-shot performance, our proposed few-shot adaptation technique can quickly adapt using as few as 10 in-distribution samples -- achieving upto 32% relative EER reduction on deepfakes in Japanese language and 20% relative reduction on ASVspoof 2021 Deepfake dataset.
PDF11August 20, 2025