GETMusic:使用統一表示和擴散框架生成任何音樂曲目
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
May 18, 2023
作者: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan
cs.AI
摘要
符號音樂生成旨在創建音符,可幫助用戶進行音樂創作,例如從頭開始生成目標樂器軌道,或基於用戶提供的源軌道。考慮到源軌道和目標軌道之間多樣且靈活的組合,一個能夠生成任意軌道的統一模型至關重要。先前的研究未能滿足這一需求,這是由於音樂表示和模型架構中固有的限制所致。為了滿足這一需求,我們提出了一個統一的表示和擴散框架,名為GETMusic(“GET”代表GEnerate music Tracks),其中包括一種名為GETScore的新穎音樂表示和一個名為GETDiff的擴散模型。GETScore將音符表示為標記,並將它們組織在二維結構中,軌道垂直堆疊,隨時間水平進行。在訓練期間,軌道被隨機選擇為目標或源。在正向過程中,目標軌道通過遮罩其標記而被損壞,而源軌道保持為真實值。在去噪過程中,GETDiff學習預測被遮罩的目標標記,並以源軌道為條件。通過GETScore中的獨立軌道和模型的非自回歸行為,GETMusic可以明確控制從頭開始生成任何目標軌道或基於源軌道的生成。我們對涉及六個樂器軌道的音樂生成進行了實驗,總共產生了665種組合。GETMusic在各種組合中提供了高質量的結果,並超越了先前針對某些特定組合提出的先前作品。
English
Symbolic music generation aims to create musical notes, which can help users
compose music, such as generating target instrumental tracks from scratch, or
based on user-provided source tracks. Considering the diverse and flexible
combination between source and target tracks, a unified model capable of
generating any arbitrary tracks is of crucial necessity. Previous works fail to
address this need due to inherent constraints in music representations and
model architectures. To address this need, we propose a unified representation
and diffusion framework named GETMusic (`GET' stands for GEnerate music
Tracks), which includes a novel music representation named GETScore, and a
diffusion model named GETDiff. GETScore represents notes as tokens and
organizes them in a 2D structure, with tracks stacked vertically and
progressing horizontally over time. During training, tracks are randomly
selected as either the target or source. In the forward process, target tracks
are corrupted by masking their tokens, while source tracks remain as ground
truth. In the denoising process, GETDiff learns to predict the masked target
tokens, conditioning on the source tracks. With separate tracks in GETScore and
the non-autoregressive behavior of the model, GETMusic can explicitly control
the generation of any target tracks from scratch or conditioning on source
tracks. We conduct experiments on music generation involving six instrumental
tracks, resulting in a total of 665 combinations. GETMusic provides
high-quality results across diverse combinations and surpasses prior works
proposed for some specific combinations.