3D-Speaker：用於語音表示解耦的大規模多設備、多距離和多方言語料庫

摘要

在語音社群中，解開言語發話中不相關的資訊是一個重要的研究主題。不同的與語音相關任務著重於提取不同的語音表示，同時最小化其他不相關資訊的影響。我們提出了一個大規模的語音語料庫，以促進語音表示解纏的研究。3D-Speaker 包含超過 10,000 名說話者，每位說話者同時被多個設備錄製，這些設備位於不同的距離，有些說話者會說多種方言。多維音頻數據的受控組合產生了一個多樣化的語音表示纏結矩陣，從而激發了解開它們的有趣方法。3D-Speaker 的多域性質也使其成為評估大型通用語音模型以及實驗跨領域學習和自監督學習方法的適當資源。https://3dspeaker.github.io/

English

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/

3D-Speaker：用於語音表示解耦的大規模多設備、多距離和多方言語料庫

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

摘要

Support