簡單且可控的音樂生成

摘要

我們致力於條件音樂生成任務。我們引入了MusicGen，這是一個單一語言模型（LM），可操作多個壓縮離散音樂表示流，即標記。與先前的工作不同，MusicGen由單階段變壓器LM和高效的標記交錯模式組成，消除了需要級聯多個模型（例如層次化或上取樣）的必要性。採用這種方法，我們展示了MusicGen如何能夠生成高質量樣本，同時在文本描述或旋律特徵的條件下進行，從而更好地控制生成的輸出。我們進行了廣泛的實證評估，考慮自動和人類研究，顯示所提出的方法優於標準文本到音樂基準上評估的基線。通過消融研究，我們闡明了構成MusicGen的每個組件的重要性。音樂樣本、代碼和模型可在https://github.com/facebookresearch/audiocraft找到。

English

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft.

簡單且可控的音樂生成

Simple and Controllable Music Generation

摘要

Support