MulliVC:具有循環一致性的多語言語音轉換
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
August 8, 2024
作者: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao
cs.AI
摘要
語音轉換旨在修改源說話者的聲音,使其類似於目標說話者,同時保留原始語音內容。儘管語音轉換在近年取得顯著進展,多語言語音轉換(包括單語言和跨語言場景)尚未受到廣泛研究。它面臨兩個主要挑戰:1)不同語言之間韻律和發音習慣的相當變異性;以及2)來自同一說話者的多語言配對數據的稀缺性。在本文中,我們提出了MulliVC,一種新穎的語音轉換系統,僅轉換音色,保留原始內容和源語言韻律,而無需多語言配對數據。具體而言,MulliVC的每個訓練步驟包含三個子步驟:第一步中,模型使用單語言語音數據進行訓練;然後,第二步和第三步借鑒回譯的思想,構建一個循環過程,以在缺乏來自同一說話者的多語言數據的情況下區分音色和其他信息(內容、韻律和其他語言相關信息)。客觀和主觀結果均表明,MulliVC在單語言和跨語言情境中明顯優於其他方法,展示了系統的有效性以及具有循環一致性的三步驟方法的可行性。聽覺樣本可在我們的演示頁面(mullivc.github.io)找到。
English
Voice conversion aims to modify the source speaker's voice to resemble the
target speaker while preserving the original speech content. Despite notable
advancements in voice conversion these days, multi-lingual voice conversion
(including both monolingual and cross-lingual scenarios) has yet to be
extensively studied. It faces two main challenges: 1) the considerable
variability in prosody and articulation habits across languages; and 2) the
rarity of paired multi-lingual datasets from the same speaker. In this paper,
we propose MulliVC, a novel voice conversion system that only converts timbre
and keeps original content and source language prosody without multi-lingual
paired data. Specifically, each training step of MulliVC contains three
substeps: In step one the model is trained with monolingual speech data; then,
steps two and three take inspiration from back translation, construct a
cyclical process to disentangle the timbre and other information (content,
prosody, and other language-related information) in the absence of
multi-lingual data from the same speaker. Both objective and subjective results
indicate that MulliVC significantly surpasses other methods in both monolingual
and cross-lingual contexts, demonstrating the system's efficacy and the
viability of the three-step approach with cycle consistency. Audio samples can
be found on our demo page (mullivc.github.io).Summary
AI-Generated Summary