VampNet：通過遮罩聲學標記建模生成音樂

摘要

我們介紹了 VampNet，一種遮罩聲學記號建模方法，用於音樂合成、壓縮、修補和變化。在訓練期間，我們使用可變遮罩時間表，這使我們能夠通過在推理期間應用各種遮罩方法（稱為提示）來從模型中採樣連貫的音樂。VampNet 是非自回歸的，利用雙向轉換器架構，在前向傳遞中關注所有記號。僅通過 36 次採樣過程，VampNet 就能生成連貫且高保真度的音樂波形。我們展示了通過以各種方式提示 VampNet，我們可以將其應用於音樂壓縮、修補、外描、延續和循環變化（vamping）等任務。在適當提示的情況下，VampNet 能夠保持音樂的風格、流派、樂器和其他高層次方面。這種靈活的提示能力使 VampNet 成為一個強大的音樂共創工具。代碼和音頻樣本可在線上獲得。

English

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.

VampNet：通過遮罩聲學標記建模生成音樂

VampNet: Music Generation via Masked Acoustic Token Modeling

摘要

Support