3D-Speaker:用于语音表示解缠的大规模多设备、多距离和多方言语料库
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
June 27, 2023
作者: Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen
cs.AI
摘要
在语音社区中,解开言语话语中不相关信息是一个关键的研究课题。不同的与语音相关的任务侧重于提取不同的语音表示,同时最小化其他不相关信息的影响。我们提出了一个大规模语音语料库,以促进语音表示解缠的研究。3D-Speaker 包含超过 10,000 位说话者,每位说话者同时被多个设备录制,这些设备位于不同的距离,有些说话者还会说多种方言。多维音频数据的受控组合产生了一个多样化的语音表示纠缠矩阵,从而激发了解开它们的有趣方法。3D-Speaker 的多领域特性还使其成为评估大型通用语音模型、实验跨领域学习和自监督学习方法的合适资源。https://3dspeaker.github.io/
English
Disentangling uncorrelated information in speech utterances is a crucial
research topic within speech community. Different speech-related tasks focus on
extracting distinct speech representations while minimizing the affects of
other uncorrelated information. We present a large-scale speech corpus to
facilitate the research of speech representation disentanglement. 3D-Speaker
contains over 10,000 speakers, each of whom are simultaneously recorded by
multiple Devices, locating at different Distances, and some speakers are
speaking multiple Dialects. The controlled combinations of multi-dimensional
audio data yield a matrix of a diverse blend of speech representation
entanglement, thereby motivating intriguing methods to untangle them. The
multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate
large universal speech models and experiment methods of out-of-domain learning
and self-supervised learning. https://3dspeaker.github.io/