GETMusic:使用统一表示和扩散框架生成任意音乐曲目
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
May 18, 2023
作者: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan
cs.AI
摘要
符号音乐生成旨在创作音符,可帮助用户创作音乐,例如从头开始生成目标乐器轨道,或基于用户提供的源轨道。考虑到源轨道和目标轨道之间多样且灵活的组合,需要一种能够生成任意轨道的统一模型至关重要。先前的研究未能解决这一需求,原因在于音乐表示和模型架构中固有的限制。为了解决这一需求,我们提出了一种名为GETMusic(`GET'代表GEnerate music Tracks)的统一表示和扩散框架,其中包括一种名为GETScore的新颖音乐表示和一种名为GETDiff的扩散模型。GETScore将音符表示为标记,并以2D结构组织,轨道垂直堆叠,随时间水平进行。在训练期间,轨道被随机选择为目标或源。在前向过程中,目标轨道通过掩盖其标记而受损,而源轨道保持为地面真相。在去噪过程中,GETDiff学会了预测受损的目标标记,条件是源轨道。通过GETScore中的单独轨道和模型的非自回归行为,GETMusic可以明确控制从头开始生成任何目标轨道或基于源轨道的生成。我们进行了涉及六个乐器轨道的音乐生成实验,共计665种组合。GETMusic在各种组合中提供了高质量的结果,并超越了先前针对某些特定组合提出的作品。
English
Symbolic music generation aims to create musical notes, which can help users
compose music, such as generating target instrumental tracks from scratch, or
based on user-provided source tracks. Considering the diverse and flexible
combination between source and target tracks, a unified model capable of
generating any arbitrary tracks is of crucial necessity. Previous works fail to
address this need due to inherent constraints in music representations and
model architectures. To address this need, we propose a unified representation
and diffusion framework named GETMusic (`GET' stands for GEnerate music
Tracks), which includes a novel music representation named GETScore, and a
diffusion model named GETDiff. GETScore represents notes as tokens and
organizes them in a 2D structure, with tracks stacked vertically and
progressing horizontally over time. During training, tracks are randomly
selected as either the target or source. In the forward process, target tracks
are corrupted by masking their tokens, while source tracks remain as ground
truth. In the denoising process, GETDiff learns to predict the masked target
tokens, conditioning on the source tracks. With separate tracks in GETScore and
the non-autoregressive behavior of the model, GETMusic can explicitly control
the generation of any target tracks from scratch or conditioning on source
tracks. We conduct experiments on music generation involving six instrumental
tracks, resulting in a total of 665 combinations. GETMusic provides
high-quality results across diverse combinations and surpasses prior works
proposed for some specific combinations.