MulliVC:具有循环一致性的多语言语音转换
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
August 8, 2024
作者: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao
cs.AI
摘要
语音转换的目标是修改源说话者的声音,使其类似于目标说话者,同时保留原始语音内容。尽管近年来语音转换取得了显著进展,但多语种语音转换(包括单语种和跨语种场景)尚未得到广泛研究。它面临两个主要挑战:1)不同语言之间韵律和发音习惯的显著变化;2)来自同一说话者的多语种配对数据的稀缺性。在本文中,我们提出了MulliVC,一种新颖的语音转换系统,仅转换音色,保留原始内容和源语言韵律,而无需多语种配对数据。具体而言,MulliVC 的每个训练步骤包含三个子步骤:第一步使用单语种语音数据对模型进行训练;然后,第二步和第三步借鉴反向翻译的思想,构建一个循环过程,以在没有来自同一说话者的多语种数据的情况下解开音色和其他信息(内容、韵律和其他与语言相关的信息)。客观和主观结果均表明,MulliVC 在单语种和跨语种环境中明显优于其他方法,展示了该系统的有效性以及具有循环一致性的三步方法的可行性。可在我们的演示页面(mullivc.github.io)上找到音频样本。
English
Voice conversion aims to modify the source speaker's voice to resemble the
target speaker while preserving the original speech content. Despite notable
advancements in voice conversion these days, multi-lingual voice conversion
(including both monolingual and cross-lingual scenarios) has yet to be
extensively studied. It faces two main challenges: 1) the considerable
variability in prosody and articulation habits across languages; and 2) the
rarity of paired multi-lingual datasets from the same speaker. In this paper,
we propose MulliVC, a novel voice conversion system that only converts timbre
and keeps original content and source language prosody without multi-lingual
paired data. Specifically, each training step of MulliVC contains three
substeps: In step one the model is trained with monolingual speech data; then,
steps two and three take inspiration from back translation, construct a
cyclical process to disentangle the timbre and other information (content,
prosody, and other language-related information) in the absence of
multi-lingual data from the same speaker. Both objective and subjective results
indicate that MulliVC significantly surpasses other methods in both monolingual
and cross-lingual contexts, demonstrating the system's efficacy and the
viability of the three-step approach with cycle consistency. Audio samples can
be found on our demo page (mullivc.github.io).Summary
AI-Generated Summary