利用本地化浮水印主動偵測語音克隆

摘要

在快速發展的語音生成模型領域中，迫切需要確保音頻真實性，以防止語音克隆的風險。我們提出了AudioSeal，這是專為本地化檢測人工智慧生成語音而設計的第一個音頻水印技術。AudioSeal採用了一種生成器/檢測器架構，通過聯合訓練本地化損失，實現了直到樣本級別的本地化水印檢測，並採用了一種受聽覺掩蔽啟發的新型感知損失，使AudioSeal能夠實現更好的不可察覺性。在自動和人工評估指標方面，AudioSeal實現了最先進的性能，具有抵抗現實音頻操作和不可察覺性的能力。此外，AudioSeal設計了一個快速的單通過檢測器，顯著超越現有模型的速度 - 實現了高達兩個數量級的更快檢測速度，使其非常適用於大規模和實時應用。

English

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

利用本地化浮水印主動偵測語音克隆

Proactive Detection of Voice Cloning with Localized Watermarking

摘要

Support