MuPT: 生成型シンボリック音楽事前学習トランスフォーマー

要旨

本論文では、大規模言語モデル（LLM）を音楽の事前学習に応用する方法を探求します。音楽モデリングにおけるMIDIの普及は広く知られていますが、我々の研究結果は、LLMがABC記譜法と本質的に高い親和性を持つことを示唆しています。ABC記譜法はLLMの設計と強みにより適合しており、これにより音楽作曲におけるモデルの性能が向上します。生成時に異なるトラック間で小節がずれるという課題に対処するため、我々は同期型マルチトラックABC記譜法（SMT-ABC記譜法）の開発を提案します。これは、複数の音楽トラック間の一貫性を維持することを目的としています。我々の貢献として、最大8192トークンを処理可能な一連のモデルを開発し、トレーニングデータセットの90％のシンボリック音楽データをカバーします。さらに、シンボリック音楽スケーリング則（SMS則）がモデル性能に与える影響についても探求します。結果は、音楽生成の将来の研究に向けた有望な方向性を示しており、オープンソースの貢献を通じてコミュニティ主導の研究に広範なリソースを提供します。

English

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90\% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.

MuPT: 生成型シンボリック音楽事前学習トランスフォーマー

MuPT: A Generative Symbolic Music Pretrained Transformer

要旨

Support