PolyVoice:语言模型用于语音到语音翻译。
PolyVoice: Language Models for Speech to Speech Translation
June 5, 2023
作者: Qianqian Dong, Zhiying Huang, Chen Xu, Yunlong Zhao, Kexin Wang, Xuxin Cheng, Tom Ko, Qiao Tian, Tang Li, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang
cs.AI
摘要
我们提出了PolyVoice,这是一个基于语言模型的语音到语音翻译(S2ST)系统框架。我们的框架包括两个语言模型:一个翻译语言模型和一个语音合成语言模型。我们使用离散化的语音单元,这些单元是完全无监督生成的,因此我们的框架可用于未书写的语言。对于语音合成部分,我们采用现有的VALL-E X方法,并构建基于单元的音频语言模型。这使我们的框架能够保留原始语音的语音特征和说话风格。我们在中文到英文和英文到西班牙文对上测试了我们的系统。实验结果显示,我们的系统能够生成具有高翻译质量和音频质量的语音。语音样本可在https://speechtranslation.github.io/polyvoice找到。
English
We propose PolyVoice, a language model-based framework for speech-to-speech
translation (S2ST) system. Our framework consists of two language models: a
translation language model and a speech synthesis language model. We use
discretized speech units, which are generated in a fully unsupervised way, and
thus our framework can be used for unwritten languages. For the speech
synthesis part, we adopt the existing VALL-E X approach and build a unit-based
audio language model. This grants our framework the ability to preserve the
voice characteristics and the speaking style of the original speech. We examine
our system on Chinese rightarrow English and English rightarrow Spanish
pairs. Experimental results show that our system can generate speech with high
translation quality and audio quality. Speech samples are available at
https://speechtranslation.github.io/polyvoice.