ChatMusician：使用LLM內在理解和生成音樂

摘要

儘管大型語言模型（LLMs）在文本生成方面展現出令人印象深刻的能力，但我們發現它們的能力尚未普遍運用於音樂，人類的創意語言。我們介紹了ChatMusician，一個集成內在音樂能力的開源LLM。它基於對文本相容的音樂表示法ABC記譜法對LLaMA2進行持續預訓練和微調，並將音樂視為第二語言。ChatMusician能夠理解並生成音樂，使用純文本分詞器，無需任何外部多模態神經結構或分詞器。有趣的是，賦予音樂能力並不損害語言能力，甚至實現了略高的MMLU分數。我們的模型能夠根據文本、和弦、旋律、主題、音樂形式等條件生成結構完整、長度適中的音樂作品，超越了GPT-4的基準。在我們精心策劃的大學水準音樂理解基準MusicTheoryBench上，ChatMusician在零-shot設置下明顯優於LLaMA2和GPT-3.5。我們的工作揭示了LLMs可以成為音樂的出色壓縮器，但仍有重要領域有待探索。我們在GitHub上釋出了我們的4B令牌音樂語言語料庫MusicPile、收集的MusicTheoryBench、代碼、模型和演示。

English

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

ChatMusician：使用LLM內在理解和生成音樂

ChatMusician: Understanding and Generating Music Intrinsically with LLM

摘要

Support