ChatMusician:使用LLM内在理解和生成音乐
ChatMusician: Understanding and Generating Music Intrinsically with LLM
February 25, 2024
作者: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
cs.AI
摘要
尽管大型语言模型(LLMs)在文本生成方面展示出令人印象深刻的能力,但我们发现它们的能力尚未普遍应用到音乐,人类的创造性语言。我们介绍了ChatMusician,这是一个集成内在音乐能力的开源LLM。它基于对文本兼容的音乐表示法ABC记谱的持续预训练和微调LLaMA2,并将音乐视为第二语言。ChatMusician能够理解并生成音乐,使用纯文本标记器而无需任何外部多模态神经结构或标记器。有趣的是,赋予音乐能力并不会损害语言能力,甚至能够实现略高的MMLU得分。我们的模型能够根据文本、和弦、旋律、主题、音乐形式等条件,创作结构良好、长度完整的音乐,超越了GPT-4的基准线。在我们精心策划的大学水平音乐理解基准测试MusicTheoryBench上,ChatMusician在零-shot设置下明显优于LLaMA2和GPT-3.5。我们的工作揭示了LLMs可以成为音乐的出色压缩器,但仍有重要领域有待开发。我们在GitHub上发布了我们的4B令牌音乐语言语料库MusicPile、收集的MusicTheoryBench、代码、模型和演示。
English
While Large Language Models (LLMs) demonstrate impressive capabilities in
text generation, we find that their ability has yet to be generalized to music,
humanity's creative language. We introduce ChatMusician, an open-source LLM
that integrates intrinsic musical abilities. It is based on continual
pre-training and finetuning LLaMA2 on a text-compatible music representation,
ABC notation, and the music is treated as a second language. ChatMusician can
understand and generate music with a pure text tokenizer without any external
multi-modal neural structures or tokenizers. Interestingly, endowing musical
abilities does not harm language abilities, even achieving a slightly higher
MMLU score. Our model is capable of composing well-structured, full-length
music, conditioned on texts, chords, melodies, motifs, musical forms, etc,
surpassing GPT-4 baseline. On our meticulously curated college-level music
understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and
GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs
can be an excellent compressor for music, but there remains significant
territory to be conquered. We release our 4B token music-language corpora
MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.