利用时间变化的扩散模型反演进行音乐风格转移。

摘要

随着扩散模型的发展，文本引导的图像风格迁移展示了高质量可控合成结果。然而，利用文本进行多样音乐风格迁移面临着重大挑战，主要是由于匹配音频-文本数据集的有限可用性。音乐作为一种抽象而复杂的艺术形式，即使在同一流派内也展现出变化和复杂性，因此使准确的文本描述具有挑战性。本文提出了一种音乐风格迁移方法，能够使用最少的数据有效捕捉音乐属性。我们引入了一种新颖的时变文本反演模块，精确捕捉不同层次的梅尔频谱特征。在推断过程中，我们提出了一种减少偏差的风格化技术，以获得稳定的结果。实验结果表明，我们的方法可以迁移特定乐器的风格，并结合自然声音来创作旋律。样本和源代码可在https://lsfhuihuiff.github.io/MusicTI/ 上获得。

English

With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even within the same genre, thereby making accurate textual descriptions challenging. This paper presents a music style transfer approach that effectively captures musical attributes using minimal data. We introduce a novel time-varying textual inversion module to precisely capture mel-spectrogram features at different levels. During inference, we propose a bias-reduced stylization technique to obtain stable results. Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and source code are available at https://lsfhuihuiff.github.io/MusicTI/.

利用时间变化的扩散模型反演进行音乐风格转移。

Music Style Transfer with Time-Varying Inversion of Diffusion Models

摘要

Support