音乐一致性模型
Music Consistency Models
April 20, 2024
作者: Zhengcong Fei, Mingyuan Fan, Junshi Huang
cs.AI
摘要
一致性模型在促进高效图像/视频生成方面展现出卓越能力,实现合成所需的采样步骤最小化。它已被证明在减轻与扩散模型相关的计算负担方面具有优势。然而,在音乐生成领域,一致性模型的应用仍然鲜为人知。为填补这一空白,我们提出了音乐一致性模型(MusicCM),它利用一致性模型的概念,高效地合成音乐片段的mel频谱图,保持高质量的同时最小化采样步骤的数量。在现有的文本到音乐扩散模型基础上,MusicCM模型融合了一致性蒸馏和对抗鉴别器训练。此外,我们发现通过整合具有共享约束的多个扩散过程生成连贯的延续音乐是有益的。实验结果显示了我们的模型在计算效率、保真度和自然度方面的有效性。值得注意的是,MusicCM仅需四个采样步骤就能实现无缝音乐合成,例如,每分钟音乐片段仅需一秒,展示了实时应用的潜力。
English
Consistency models have exhibited remarkable capabilities in facilitating
efficient image/video generation, enabling synthesis with minimal sampling
steps. It has proven to be advantageous in mitigating the computational burdens
associated with diffusion models. Nevertheless, the application of consistency
models in music generation remains largely unexplored. To address this gap, we
present Music Consistency Models (MusicCM), which leverages the
concept of consistency models to efficiently synthesize mel-spectrogram for
music clips, maintaining high quality while minimizing the number of sampling
steps. Building upon existing text-to-music diffusion models, the
MusicCM model incorporates consistency distillation and adversarial
discriminator training. Moreover, we find it beneficial to generate extended
coherent music by incorporating multiple diffusion processes with shared
constraints. Experimental results reveal the effectiveness of our model in
terms of computational efficiency, fidelity, and naturalness. Notable,
MusicCM achieves seamless music synthesis with a mere four sampling
steps, e.g., only one second per minute of the music clip, showcasing the
potential for real-time application.Summary
AI-Generated Summary