PeriodWave:用于高保真波形生成的多周期流匹配
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
August 14, 2024
作者: Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
cs.AI
摘要
最近,人们研究了在各种分布场景下生成通用波形的任务。虽然基于GAN的方法在快速波形生成方面表现出了优势,但它们容易受到训练-推断不匹配的情况的影响,比如两阶段文本转语音。与此同时,基于扩散的模型在其他领域展现了强大的生成性能;然而,在波形生成任务中由于推断速度慢而鲜为人知。最重要的是,目前还没有一种生成器架构能够明确地解开高分辨率波形信号的自然周期特征。在本文中,我们提出了一种新颖的通用波形生成模型PeriodWave。首先,我们引入了一种能够在估计矢量场时捕捉波形信号周期特征的周期感知流匹配估计器。此外,我们利用多周期估计器避免重叠,捕捉波形信号的不同周期特征。虽然增加周期数可以显著提高性能,但这需要更多的计算成本。为了减少这个问题,我们还提出了一种单周期条件通用估计器,可以通过逐周期批量推断进行前馈。此外,我们利用离散小波变换无损解开波形信号的频率信息,用于高频建模,并引入FreeU来减少波形生成中的高频噪声。实验结果表明,我们的模型在Mel频谱重建和文本转语音任务中均优于先前的模型。所有源代码将在https://github.com/sh-lee-prml/PeriodWave 上提供。
English
Recently, universal waveform generation tasks have been investigated
conditioned on various out-of-distribution scenarios. Although GAN-based
methods have shown their strength in fast waveform generation, they are
vulnerable to train-inference mismatch scenarios such as two-stage
text-to-speech. Meanwhile, diffusion-based models have shown their powerful
generative performance in other domains; however, they stay out of the
limelight due to slow inference speed in waveform generation tasks. Above all,
there is no generator architecture that can explicitly disentangle the natural
periodic features of high-resolution waveform signals. In this paper, we
propose PeriodWave, a novel universal waveform generation model. First, we
introduce a period-aware flow matching estimator that can capture the periodic
features of the waveform signal when estimating the vector fields.
Additionally, we utilize a multi-period estimator that avoids overlaps to
capture different periodic features of waveform signals. Although increasing
the number of periods can improve the performance significantly, this requires
more computational costs. To reduce this issue, we also propose a single
period-conditional universal estimator that can feed-forward parallel by
period-wise batch inference. Additionally, we utilize discrete wavelet
transform to losslessly disentangle the frequency information of waveform
signals for high-frequency modeling, and introduce FreeU to reduce the
high-frequency noise for waveform generation. The experimental results
demonstrated that our model outperforms the previous models both in
Mel-spectrogram reconstruction and text-to-speech tasks. All source code will
be available at https://github.com/sh-lee-prml/PeriodWave.Summary
AI-Generated Summary