PeriodWave:用於高保真波形生成的多時期流匹配
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
August 14, 2024
作者: Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee
cs.AI
摘要
最近,人們開始研究在各種分布外情境下生成通用波形的任務。儘管基於 GAN 的方法在快速波形生成方面表現出強大的能力,但它們容易受到訓練-推斷不匹配情境的影響,例如兩階段文本轉語音。與此同時,基於擴散的模型展現了在其他領域中強大的生成性能;然而,在波形生成任務中由於推斷速度較慢而未受到關注。總而言之,目前還沒有一種生成器架構可以明確地解開高分辨率波形信號的自然周期特徵。在本文中,我們提出了一種新穎的通用波形生成模型 PeriodWave。首先,我們引入了一種能夠在估計向量場時捕捉波形信號的周期特徵的週期感知流匹配估計器。此外,我們利用多週期估計器避免重疊,以捕捉波形信號的不同週期特徵。雖然增加週期數量可以顯著提高性能,但這需要更多的計算成本。為了減少這個問題,我們還提出了一種單週期條件通用估計器,可以通過逐週期批量推斷進行前饋。此外,我們利用離散小波變換來無損解開波形信號的頻率信息以進行高頻建模,並引入 FreeU 以減少波形生成中的高頻噪音。實驗結果表明,我們的模型在 Mel 頻譜圖重建和文本轉語音任務中均優於先前的模型。所有源代碼將在 https://github.com/sh-lee-prml/PeriodWave 上提供。
English
Recently, universal waveform generation tasks have been investigated
conditioned on various out-of-distribution scenarios. Although GAN-based
methods have shown their strength in fast waveform generation, they are
vulnerable to train-inference mismatch scenarios such as two-stage
text-to-speech. Meanwhile, diffusion-based models have shown their powerful
generative performance in other domains; however, they stay out of the
limelight due to slow inference speed in waveform generation tasks. Above all,
there is no generator architecture that can explicitly disentangle the natural
periodic features of high-resolution waveform signals. In this paper, we
propose PeriodWave, a novel universal waveform generation model. First, we
introduce a period-aware flow matching estimator that can capture the periodic
features of the waveform signal when estimating the vector fields.
Additionally, we utilize a multi-period estimator that avoids overlaps to
capture different periodic features of waveform signals. Although increasing
the number of periods can improve the performance significantly, this requires
more computational costs. To reduce this issue, we also propose a single
period-conditional universal estimator that can feed-forward parallel by
period-wise batch inference. Additionally, we utilize discrete wavelet
transform to losslessly disentangle the frequency information of waveform
signals for high-frequency modeling, and introduce FreeU to reduce the
high-frequency noise for waveform generation. The experimental results
demonstrated that our model outperforms the previous models both in
Mel-spectrogram reconstruction and text-to-speech tasks. All source code will
be available at https://github.com/sh-lee-prml/PeriodWave.Summary
AI-Generated Summary