利用本地水印技术主动检测语音克隆

摘要

在快速发展的语音生成模型领域，迫切需要确保音频的真实性，以防止声音克隆的风险。我们提出了AudioSeal，这是第一个专为本地化检测人工智能生成语音而设计的音频水印技术。AudioSeal采用了一个生成器/检测器架构，与一个本地化损失一起进行联合训练，以实现直至样本级别的本地化水印检测，并采用了受听觉掩蔽启发的新型感知损失，使得AudioSeal能够实现更好的不可察觉性。在真实生活音频处理和基于自动和人工评估指标的不可察觉性方面，AudioSeal实现了最先进的性能。此外，AudioSeal设计了一个快速的单次检测器，明显超越现有模型的速度 - 实现了高达两个数量级的更快检测速度，使其非常适用于大规模和实时应用。

English

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

利用本地水印技术主动检测语音克隆

Proactive Detection of Voice Cloning with Localized Watermarking

摘要

Support