ChatPaper.aiChatPaper

通过对抗流匹配优化加速高保真波形生成

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

August 15, 2024
作者: Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
cs.AI

摘要

本文介绍了PeriodWave-Turbo,一种通过对抗流匹配优化实现高保真和高效波形生成模型。最近,条件流匹配(CFM)生成模型已成功应用于波形生成任务,利用单一向量场估计目标进行训练。尽管这些模型能够生成高保真波形信号,但与基于GAN的模型相比,它们需要更多的ODE步骤,后者只需要单一生成步骤。此外,由于嘈杂的向量场估计导致生成的样本通常缺乏高频信息,无法确保高频重现。为解决这一限制,我们通过引入固定步长生成器修改增强了预训练的CFM生成模型。我们利用重构损失和对抗反馈来加速高保真波形生成。通过对抗流匹配优化,只需1,000步微调即可在各种客观指标上实现最先进的性能。此外,我们将推理速度从16步显著降低至2或4步。此外,通过将PeriodWave的骨干网络参数从29M扩展到70M以提高泛化能力,PeriodWave-Turbo实现了前所未有的性能,在LibriTTS数据集上的语音质量感知评估(PESQ)得分为4.454。音频样本、源代码和检查点将在https://github.com/sh-lee-prml/PeriodWave 上提供。
English
This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals, they require significantly more ODE steps compared to GAN-based models, which only need a single generation step. Additionally, the generated samples often lack high-frequency information due to noisy vector field estimation, which fails to ensure high-frequency reproduction. To address this limitation, we enhance pre-trained CFM-based generative models by incorporating a fixed-step generator modification. We utilized reconstruction losses and adversarial feedback to accelerate high-fidelity waveform generation. Through adversarial flow matching optimization, it only requires 1,000 steps of fine-tuning to achieve state-of-the-art performance across various objective metrics. Moreover, we significantly reduce inference speed from 16 steps to 2 or 4 steps. Additionally, by scaling up the backbone of PeriodWave from 29M to 70M parameters for improved generalization, PeriodWave-Turbo achieves unprecedented performance, with a perceptual evaluation of speech quality (PESQ) score of 4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints will be available at https://github.com/sh-lee-prml/PeriodWave.

Summary

AI-Generated Summary

PDF114November 26, 2024