ChatPaper.aiChatPaper

VALL-E 2:神經編解碼語言模型是達到人類水準的零-shot 文本轉語音合成器。

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

June 8, 2024
作者: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei
cs.AI

摘要

本文介紹了 VALL-E 2,這是神經編解碼器語言模型的最新進展,標誌著零-shot文本轉語音合成(TTS)領域的里程碑,首次實現了與人類的相等性。基於其前身 VALL-E,新版本引入了兩個重要的增強功能:重複感知取樣(Repetition Aware Sampling)通過考慮解碼歷史中的標記重複來優化原始核心取樣過程。它不僅穩定了解碼過程,還避免了無限循環問題。分組編碼建模(Grouped Code Modeling)將編碼器代碼組織成群組,以有效縮短序列長度,這不僅提高了推理速度,還應對了長序列建模的挑戰。我們在 LibriSpeech 和 VCTK 數據集上的實驗表明,VALL-E 2 在語音的穩健性、自然性和語者相似性方面超越了先前的系統。它是第一個在這些基準上達到人類相等性的模型。此外,VALL-E 2 在合成高質量語音方面表現一致,即使對於因複雜性或重複短語而傳統上具有挑戰性的句子也是如此。這項工作的優勢可能有助於寶貴的努力,例如為患失語症或肌萎縮性脊髓側索硬化症的人生成語音。VALL-E 2 的演示將發布在 https://aka.ms/valle2。
English
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis. Demos of VALL-E 2 will be posted to https://aka.ms/valle2.

Summary

AI-Generated Summary

PDF190December 8, 2024