Matcha-TTS：一种具有条件流匹配的快速TTS架构

摘要

我们介绍了Matcha-TTS，这是一种新的编码器-解码器架构，用于快速TTS声学建模，训练使用最优输运条件流匹配（OT-CFM）。这产生了一种基于ODE的解码器，能够在比使用得分匹配训练的模型更少的合成步骤中实现高输出质量。谨慎的设计选择还确保每个合成步骤运行速度快。该方法是概率的、非自回归的，并且可以从头开始学会说话而无需外部对齐。与强大的预训练基线模型相比，Matcha-TTS系统具有最小的内存占用，与最快模型在长句上的速度相媲美，并在听觉测试中获得最高的平均意见分数。请参阅https://shivammehta25.github.io/Matcha-TTS/ 获取音频示例、代码和预训练模型。

English

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching. Careful design choices additionally ensure each synthesis step is fast to run. The method is probabilistic, non-autoregressive, and learns to speak from scratch without external alignments. Compared to strong pre-trained baseline models, the Matcha-TTS system has the smallest memory footprint, rivals the speed of the fastest models on long utterances, and attains the highest mean opinion score in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for audio examples, code, and pre-trained models.

Matcha-TTS：一种具有条件流匹配的快速TTS架构

Matcha-TTS: A fast TTS architecture with conditional flow matching

摘要

Support