ChatMusician：使用LLM内在理解和生成音乐

摘要

尽管大型语言模型（LLMs）在文本生成方面展示出令人印象深刻的能力，但我们发现它们的能力尚未普遍应用到音乐，人类的创造性语言。我们介绍了ChatMusician，这是一个集成内在音乐能力的开源LLM。它基于对文本兼容的音乐表示法ABC记谱的持续预训练和微调LLaMA2，并将音乐视为第二语言。ChatMusician能够理解并生成音乐，使用纯文本标记器而无需任何外部多模态神经结构或标记器。有趣的是，赋予音乐能力并不会损害语言能力，甚至能够实现略高的MMLU得分。我们的模型能够根据文本、和弦、旋律、主题、音乐形式等条件，创作结构良好、长度完整的音乐，超越了GPT-4的基准线。在我们精心策划的大学水平音乐理解基准测试MusicTheoryBench上，ChatMusician在零-shot设置下明显优于LLaMA2和GPT-3.5。我们的工作揭示了LLMs可以成为音乐的出色压缩器，但仍有重要领域有待开发。我们在GitHub上发布了我们的4B令牌音乐语言语料库MusicPile、收集的MusicTheoryBench、代码、模型和演示。

English

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

ChatMusician：使用LLM内在理解和生成音乐

ChatMusician: Understanding and Generating Music Intrinsically with LLM

摘要

Support