パーソナライズ可能な長文脈シンボリック音楽インフィリング with MIDI-RWKV

要旨

既存の自動音楽生成研究は、主に完全な作曲や続きを生成するエンドツーエンドシステムに焦点を当ててきました。しかし、音楽作曲は通常反復的なプロセスであるため、そのようなシステムでは、コンピュータ支援型の創造性に不可欠な人間と機械の間の双方向的な関与が困難です。本研究では、コンピュータ支援型作曲プロセスを強化するために、パーソナライズ可能でマルチトラック、長文脈、制御可能なシンボリック音楽の埋め込みタスクに取り組みます。RWKV-7線形アーキテクチャに基づく新規モデルであるMIDI-RWKVを提案し、エッジデバイス上での効率的で一貫性のある音楽の共創を可能にします。また、MIDI-RWKVが、非常に少ないサンプル数でのパーソナライゼーションのために初期状態を微調整する効果的な方法を許容することを示します。MIDI-RWKVとその状態調整をいくつかの定量的および定性的な指標で評価し、モデルの重みとコードをhttps://github.com/christianazinn/MIDI-RWKVで公開します。

English

Existing work in automatic music generation has primarily focused on end-to-end systems that produce complete compositions or continuations. However, because musical composition is typically an iterative process, such systems make it difficult to engage in the back-and-forth between human and machine that is essential to computer-assisted creativity. In this study, we address the task of personalizable, multi-track, long-context, and controllable symbolic music infilling to enhance the process of computer-assisted composition. We present MIDI-RWKV, a novel model based on the RWKV-7 linear architecture, to enable efficient and coherent musical cocreation on edge devices. We also demonstrate that MIDI-RWKV admits an effective method of finetuning its initial state for personalization in the very-low-sample regime. We evaluate MIDI-RWKV and its state tuning on several quantitative and qualitative metrics, and release model weights and code at https://github.com/christianazinn/MIDI-RWKV.

パーソナライズ可能な長文脈シンボリック音楽インフィリング with MIDI-RWKV

Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV

要旨

Support