ChatPaper.aiChatPaper

音樂一致性模型

Music Consistency Models

April 20, 2024
作者: Zhengcong Fei, Mingyuan Fan, Junshi Huang
cs.AI

摘要

一致性模型在促進高效影像/影片生成方面展現出卓越能力,能夠在最少的取樣步驟下進行合成。這已被證明有助於減輕與擴散模型相關的計算負擔。然而,在音樂生成領域中,一致性模型的應用仍然未被廣泛探索。為彌補這一空白,我們提出了音樂一致性模型(MusicCM),利用一致性模型的概念,有效地合成音樂片段的mel-spectrogram,保持高質量的同時最小化取樣步驟的數量。在現有的文本到音樂擴散模型基礎上,MusicCM 模型融入了一致性蒸餾和對抗性鑑別器訓練。此外,我們發現通過結合具有共享約束的多個擴散過程來生成延續連貫的音樂是有益的。實驗結果顯示我們的模型在計算效率、保真度和自然性方面的有效性。值得注意的是,MusicCM 在僅四個取樣步驟下實現了無縫音樂合成,例如每分鐘音樂片段僅需一秒,展示了實時應用的潛力。
English
Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (MusicCM), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the MusicCM model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of computational efficiency, fidelity, and naturalness. Notable, MusicCM achieves seamless music synthesis with a mere four sampling steps, e.g., only one second per minute of the music clip, showcasing the potential for real-time application.

Summary

AI-Generated Summary

PDF143December 15, 2024