可个性化长上下文符号音乐填充技术：基于MIDI-RWKV

摘要

现有自动音乐生成的研究主要集中于端到端系统，这些系统能够创作完整的乐曲或延续片段。然而，由于音乐创作通常是一个迭代过程，此类系统难以实现人机之间对于计算机辅助创作至关重要的互动交流。本研究针对个性化、多轨、长上下文且可控的符号音乐填充任务，旨在优化计算机辅助作曲流程。我们提出了MIDI-RWKV，一种基于RWKV-7线性架构的新颖模型，以实现在边缘设备上高效且连贯的音乐共创。此外，我们展示了MIDI-RWKV在极低样本量情况下，通过微调其初始状态实现个性化的有效方法。我们通过多项定量与定性指标对MIDI-RWKV及其状态调优进行了评估，并在https://github.com/christianazinn/MIDI-RWKV发布了模型权重与代码。

English

Existing work in automatic music generation has primarily focused on end-to-end systems that produce complete compositions or continuations. However, because musical composition is typically an iterative process, such systems make it difficult to engage in the back-and-forth between human and machine that is essential to computer-assisted creativity. In this study, we address the task of personalizable, multi-track, long-context, and controllable symbolic music infilling to enhance the process of computer-assisted composition. We present MIDI-RWKV, a novel model based on the RWKV-7 linear architecture, to enable efficient and coherent musical cocreation on edge devices. We also demonstrate that MIDI-RWKV admits an effective method of finetuning its initial state for personalization in the very-low-sample regime. We evaluate MIDI-RWKV and its state tuning on several quantitative and qualitative metrics, and release model weights and code at https://github.com/christianazinn/MIDI-RWKV.