零-shot 跨语言语音转换用于 TTS
Zero-shot Cross-lingual Voice Transfer for TTS
September 20, 2024
作者: Fadi Biadsy, Youzheng Chen, Isaac Elias, Kyle Kastner, Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran
cs.AI
摘要
本文介绍了一种零样本语音转换(VT)模块,可无缝集成到多语言文本转语音(TTS)系统中,实现跨语言转换个人的语音。我们提出的VT模块包括一个处理参考语音的说话人编码器、一个瓶颈层和残差适配器,连接到现有的TTS层。我们比较了这些组件的各种配置的性能,并报告了跨语言的平均意见分数(MOS)和说话人相似度。使用每位说话人的单个英语参考语音,我们在九种目标语言中实现了平均语音转换相似度得分达到73%。声音特征对于构建和感知个体身份具有重要影响。由于生理或神经状况导致声音丧失可能会引发对核心身份的深刻失落感。作为一个案例研究,我们演示了我们的方法不仅可以转换典型语音,还可以恢复患有运动障碍的个体的声音,即使只有非典型语音样本可用-对于那些从未有过典型语音或存储过自己声音的人来说,这是一种宝贵的工具。提供跨语言典型音频样本以及演示为运动障碍说话者恢复声音的视频,网址为(google.github.io/tacotron/publications/zero_shot_voice_transfer)。
English
In this paper, we introduce a zero-shot Voice Transfer (VT) module that can
be seamlessly integrated into a multi-lingual Text-to-speech (TTS) system to
transfer an individual's voice across languages. Our proposed VT module
comprises a speaker-encoder that processes reference speech, a bottleneck
layer, and residual adapters, connected to preexisting TTS layers. We compare
the performance of various configurations of these components and report Mean
Opinion Score (MOS) and Speaker Similarity across languages. Using a single
English reference speech per speaker, we achieve an average voice transfer
similarity score of 73% across nine target languages. Vocal characteristics
contribute significantly to the construction and perception of individual
identity. The loss of one's voice, due to physical or neurological conditions,
can lead to a profound sense of loss, impacting one's core identity. As a case
study, we demonstrate that our approach can not only transfer typical speech
but also restore the voices of individuals with dysarthria, even when only
atypical speech samples are available - a valuable utility for those who have
never had typical speech or banked their voice. Cross-lingual typical audio
samples, plus videos demonstrating voice restoration for dysarthric speakers
are available here
(google.github.io/tacotron/publications/zero_shot_voice_transfer).Summary
AI-Generated Summary