LibriTTS-R:一个恢复的多说话人文本转语音语料库
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
May 30, 2023
作者: Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna
cs.AI
摘要
本文介绍了一个名为“LibriTTS-R”的新语音数据集,专为文本到语音(TTS)应用而设计。该数据集是通过将语音恢复应用于LibriTTS语料库而衍生而来,该语料库包括来自2,456位发言者的585小时24 kHz采样率的语音数据以及相应的文本。LibriTTS-R的组成样本与LibriTTS相同,只是声音质量得到了改善。实验结果表明,与LibriTTS中的样本相比,LibriTTS-R的地面真实样本显示出显著改善的声音质量。此外,使用LibriTTS-R训练的神经端到端TTS实现了与地面真实样本相媲美的语音自然度。该语料库可从http://www.openslr.org/141/免费下载。
English
This paper introduces a new speech dataset called ``LibriTTS-R'' designed for
text-to-speech (TTS) use. It is derived by applying speech restoration to the
LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling
rate from 2,456 speakers and the corresponding texts. The constituent samples
of LibriTTS-R are identical to those of LibriTTS, with only the sound quality
improved. Experimental results show that the LibriTTS-R ground-truth samples
showed significantly improved sound quality compared to those in LibriTTS. In
addition, neural end-to-end TTS trained with LibriTTS-R achieved speech
naturalness on par with that of the ground-truth samples. The corpus is freely
available for download from http://www.openslr.org/141/.