VampNet：通过掩码声学令牌建模进行音乐生成

摘要

我们介绍了VampNet，一种用于音乐合成、压缩、修复和变化的掩码声学标记建模方法。我们在训练过程中使用可变掩码计划，通过在推断过程中应用各种掩码方法（称为提示）来从模型中采样连贯的音乐。VampNet是非自回归的，利用双向Transformer架构，在前向传递中关注所有标记。仅需36次采样传递，VampNet就能生成连贯的高保真音乐波形。我们展示了通过以各种方式提示VampNet，我们可以将其应用于音乐压缩、修复、扩展、延续和变化循环（vamping）等任务。适当提示的情况下，VampNet能够保持音乐的风格、流派、乐器和其他高层次方面。这种灵活的提示能力使VampNet成为强大的音乐共创工具。代码和音频样本可在线获取。

English

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.

VampNet：通过掩码声学令牌建模进行音乐生成

VampNet: Music Generation via Masked Acoustic Token Modeling

摘要

Support