简单且可控的音乐生成

摘要

我们着手处理条件音乐生成任务。我们引入了MusicGen，这是一个单一的语言模型（LM），它可以处理几个流的离散压缩音乐表示，即标记。与先前的工作不同，MusicGen由一个单阶段的Transformer LM和高效的标记交错模式组成，消除了级联多个模型的需要，例如分层或上采样。遵循这种方法，我们展示了MusicGen如何能够生成高质量的样本，同时可以在文本描述或旋律特征的条件下进行控制，从而更好地控制生成的输出。我们进行了广泛的实证评估，考虑了自动化和人类研究，展示了所提出的方法在标准文本到音乐基准上优于评估基线。通过消融研究，我们阐明了构成MusicGen的每个组件的重要性。音乐样本、代码和模型可在https://github.com/facebookresearch/audiocraft找到。

English

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft.

简单且可控的音乐生成

Simple and Controllable Music Generation

摘要

Support