予測型音楽トランスフォーマー

要旨

本論文では、アンティシペーション（anticipation）という手法を導入します。これは、第二の相関プロセス（制御プロセス）の実現値に非同期で条件付けられた時系列点過程（イベントプロセス）の制御可能な生成モデルを構築する方法です。これを実現するために、イベントと制御のシーケンスを交互に配置し、制御がイベントシーケンス内の停止時間の後に現れるようにします。この研究は、シンボリック音楽生成の制御において生じる問題に動機づけられています。我々は、制御がイベント自体の部分集合であるインフィリング制御タスクに焦点を当て、固定された制御イベントが与えられた下でイベントシーケンスを完成させる条件付き生成を行います。アンティシペーションを用いたインフィリングモデルは、大規模で多様なLakh MIDI音楽データセットを使用して訓練されます。これらのモデルは、プロンプト付き音楽生成において自己回帰モデルと同等の性能を示し、さらに伴奏を含むインフィリング制御タスクを実行する能力を備えています。人間の評価者は、アンティシペーションモデルが生成する伴奏が、20秒のクリップにおいて人間が作曲した音楽と同等の音楽性を持つと報告しています。

English

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

予測型音楽トランスフォーマー

Anticipatory Music Transformer

要旨

Support