예측형 음악 트랜스포머

초록

우리는 '예측(anticipation)'이라는 방법을 소개한다: 이는 두 번째 상관된 프로세스(제어 프로세스)의 실현에 비동기적으로 조건부로 설정된 시간적 포인트 프로세스(이벤트 프로세스)의 제어 가능한 생성 모델을 구축하는 방법이다. 이를 위해 이벤트와 제어의 시퀀스를 교차 배치하여, 이벤트 시퀀스에서 정지 시간(stopping time) 이후에 제어가 나타나도록 한다. 이 연구는 기호 음악 생성 제어에서 발생하는 문제들에 의해 동기가 부여되었다. 우리는 '채우기(infilling)' 제어 작업에 초점을 맞추는데, 여기서 제어는 이벤트 자체의 부분집합이며, 조건부 생성은 고정된 제어 이벤트가 주어진 상태에서 이벤트 시퀀스를 완성한다. 우리는 대규모이고 다양한 Lakh MIDI 음악 데이터셋을 사용하여 예측적 채우기 모델을 학습시킨다. 이러한 모델들은 프롬프트 기반 음악 생성에서 자기회귀(autoregressive) 모델의 성능을 따라잡을 뿐만 아니라, 반주를 포함한 채우기 제어 작업을 수행할 수 있는 추가적인 능력을 갖추고 있다. 인간 평가자들은 예측 모델이 20초 클립 동안 인간이 작곡한 음악과 유사한 음악성을 가진 반주를 생성한다고 보고했다.

English

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

예측형 음악 트랜스포머

Anticipatory Music Transformer

초록

Support